Distributed data warehouse in the concept of Data Lake: installation CDH

We continue to share our experience in organizing the data warehouse, which we began to talk about in previous post . This time we want to talk about how we solved the tasks of installing CDH.
 
 
Distributed data warehouse in the concept of Data Lake: installation CDH  
) Www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html ). In this case, you need to download the Spark 2 CSD file (available on the Version and Packaging Information page - cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html ), Install it on the host with Cloudera ...
+ 0 -

How do we transfer 36 million customers to the digital support service

We at Rostelecom have long thought about translating the voice client service into digital channels of text communication. For small companies, the task looks easy, but when it comes to a service of hundreds, and in the long term even thousands of support operators, there is something to think about. In this post we will tell you what solution was found for this, what it is in general and allows you to do it. Spoiler: a lot of things.
 
 
How do we transfer 36 million customers to the digital support service  
 

Searches for the solution


 
Now more and more people prefer social networks and instant messengers to regular telephone conversations. Therefore, Rostelecom decided ...[/h]
+ 0 -

Neural machine translation Google

The report was written in December 2017.
 
It's not who has the best algorithms that wins. It's who has the most data. The winner is not the one who has the better algorithm, but the one who has more data. Andrew Ng, a lecturer in the machine learning course at Coursera.
If you scale up both the size of the model, and you can learn finer distinctions or more complex features. These models can usually take a lot more context. Jeff Dean, an engineer assisting the research at Google. If you increase the size of the model and give it more data for learning, it will begin to distinguish more subtle ...
+ 0 -

How we scanned the entire Internet and that we found out

How many sites do you use daily? A couple of social networks, a search engine, several favorite publishers, about 5 working services. Perhaps, it is unlikely to be typed more than 20 sites.
 
 
How we scanned the entire Internet and that we found out  
 
Have you ever wondered how many sites on the Internet and what happens to them?
 
example.com . 3 million domains do not tie ip to a subdomain of www.
 
 
An important point is the presence of redirects between versions. Because if there are 200 codes in both cases, then for the search engine they are two different sites with duplicated content. I want to remind you, do not forget to set up correct redirects.
 
Redirects w...
+ 0 -

Oh, My Code: Machine learning and analytics in "Classmates"

Oh, My Code: Machine learning and analytics in "Classmates"  
 
What is the difference between Machine Learning and data analysis, who sits in "Odnoklassniki" and how to start my way in machine learning? We talk about this in the twelfth issue of a talk show for programmers.
 
 

 
Video on Channel Techrimrim
 
 
The presenter of the program is the technical director of media projects Pavel Scherbinin, the guest is an engineer-analyst of Odnoklassnikov Dmitry Bugaychenko.
 
Spark , for access to which we use the web front Zeppelin . Basically, the data comes through ...
+ 0 -

Habra-dictionary. Part 1

Friends, good afternoon.
 
Solved the task of compiling the dictionary Habrahabra for the purpose of tracking the emergence of new languages, frameworks, management practices, etc. More shortly new words.
 
The result was a list of English words "in the nominative and singular".
 
I did Windows 10 x64 in the environment, I used the Python 3 language in the Spyder editor in Anaconda 5.1.? I used a wired connection to the network.
 
In this article I get a dictionary of English words on a limited sample. If the topic turns out to be interesting, then in the future I plan to get a dictionary of both ...
+ 0 -

TOP 8 books worth reading this summer

TOP 8 books worth reading this summer  
 
Good afternoon, dear Khabarovsk citizens!
 
 
Congratulations on the first day of summer! Summer is a time of rest, but it is important to spend it also with benefit. Today we want to move away a bit from our standard data analysis topic in Splunk and present to you a list of books that we believe should be read this summer if you want to be on the wave of recent trends and trends, be aware of interesting development publications information technology, IoT, data analysis, information security, and so on. or you want to improve your skills.
 
 
...
+ 0 -

Features of hiring AI & Data Science specialists

Together with Anna the First
 

Introduction


 
Every day, humanity creates, uses and stores huge amounts of data. Every article, blog post or instagramm, every kid and indeed every fact of communication - data that, when processed, become valuable, brings profit and warns against the risks of the one who owns them and knows how to extract relevant information.
 
With the growth of data analysis capabilities and the recognition of the usefulness of existing archives, the need for experts in Data Science, machine learning and artificial intelligence (AI) is growing, capable of working with data ...[/h]
+ 0 -

Python Selenium and Krisha.kz. The first in Big Data

Python Selenium and Krisha.kz. The first in Big DataForeword
 
To learn something new is always interesting, it captures you completely, at least for me so. This time, too, after studying Python programming, I wondered where it could be applied, except when creating a photo separator (an article about it will be a little later) and a sales accounting program, and came across an article about big data (Big Data) . Having studied the materials on Big Data, I realized that the direction is very promising and it is worth spending time studying it.
 
After that, he began to study an unimaginable number of articles and after watching a couple of dozen ...
+ 0 -

Bigdata, machine learning and neural networks - for managers

Bigdata, machine learning and neural networks - for managersIf the manager tries to understand this area and get specific business answers, then, most likely, the head is terribly ill and the heart flies from the sense of every moment of lost profit.
 
"AlphaGo beat the champion on Go" for the first time in the history of mankind, soon our streets will flood unmanned vehicles, face recognition and voice are now in order, and in the apartment tomorrow we will hear AI-sex dolls with the breast of the highest size with champagne under the mouse and adjustable level of intensity and duration of orgasm.
 
It's all so, but what to do right now. How to make money ...
+ 0 -