Google News and Leo Tolstoy: Visualizing Vector Representations of Words with t-SNE

Google News and Leo Tolstoy: Visualizing Vector Representations of Words with t-SNE  3r33333.
 3r33333. Each of us perceives the texts in his own way, be it news on the Internet, poetry or classic novels. The same applies to algorithms and methods of machine learning, which, as a rule, perceive texts in mathematical form, in the form of a multidimensional vector space.
 3r33333.
 3r33333. The article is devoted to the visualization of multidimensional vector representations of words using t-SNE calculated Word2Vec. Visualization will allow you to more fully understand how Word2Vec works and how to interpret the relationship between word vectors before using them further in neural networks ...
+ 0 -

Open lesson "Feature Engineering on the example of the classic dataset of the Titanic"

 3r33333. 3r3-31. Hello again!
 3r33333.
 3r33333. In December, we will start training for the next group "Data scientist" therefore, there are more and more open lessons and other activities. For example, just recently, a webinar was held under the long title “Feature Engineering on the example of the classic dataset of the Titanic”. He spent 3r3327. Alexander Sizov
- an experienced developer, Ph.D., an expert on Machine /Deep learning and a participant in various commercial international projects related to artificial intelligence and data analysis.
 3r33333.
 3r33333. Open lesson took ...
+ 0 -

Test and debug MapReduce

 3r33333. 3r3-31. At Rostelecom, we use Hadoop to store and process data downloaded from multiple sources using java applications. Now we have moved to the new version of hadoop with Kerberos Authentication. When moving, we encountered a number of problems, including the use of the YARN API. Using Hadoop with Kerberos Authentication deserves a separate article, and in this one we’ll talk about debugging Hadoop MapReduce. 3r3306.  3r33333. 3r3306.  3r33333. Test and debug MapReduce 3r3151. 3r3306.  3r33333. When performing tasks in a cluster, launching the debugger is complicated by the fact that we do not know which node will ...
+ 0 -

Apache NiFi: what it is and a brief overview of the features

Today, on thematic foreign sites about Big Data, one can find the mention of such a relatively new for the Hadoop ecosystem tool like Apache NiFi. This is a modern open source ETL tool. Distributed architecture for fast parallel loading and processing of data, a large number of plug-ins for sources and transformations, versioning of configurations is only part of its advantages. For all its power, NiFi remains fairly simple to use.
 
 
Apache NiFi: what it is and a brief overview of the features
 
 
At Rostelecom, we are striving to develop work with Hadoop, so we have already tried and evaluated the advantages of Apache NiFi compared to other solutions. In this article ...
+ 0 -

Data migration in a bloody enterprise: what to analyze, so as not to overwhelm the project

Data migration in a bloody enterprise: what to analyze, so as not to overwhelm the project 3r3r1314.  
3r3r1314.  
A typical system integration project for us looks like this: the customer has a wagon system for customer accounting, the task is to assemble customer cards into a single database. And not only to collect, but also to clear from duplicates and garbage. To get a clean, structured, full customer cards. 3r3r1314.  
3r3r1314.  
For beginners, I will explain that the migration goes according to the following scheme: 3r31316. sources → data conversion (meets 3r314. ETL 3r3131318. or 3r3r166. bus 3r31318.) → receiver 3r31319. . 3r3r1314.  
3r3r1314.  
On one project, we lost three months ...
+ 0 -

How machine learning will help, when every minute counts

 3r33333. 3r3-31. Imagine that you need to call a taxi. You open the application, see that the car will arrive in seven minutes, click "Order" - and the car is 15 minutes away from you, if it is found at all. Agree, unpleasant? 3r33354.  3r33333. 3r33354.  3r33333. Under the cut, let's talk about how machine learning methods help Yandex.Taxi to better predict ETA (Estimated Time of Arrival). 3r33354.  3r33333. 3r33354.  3r33333. How machine learning will help, when every minute counts 3r33170. 3r33354.  3r33333. 3r33354.  3r33333. To begin, we recall that the user sees in the application before ordering:
 3r33333. 3r33354.  3r33333. ...
+ 0 -

Consistency and ACID guarantees in distributed storage systems

Consistency and ACID guarantees in distributed storage systemsDistributed systems are used when there is a need for horizontal scaling to provide increased performance indicators that a vertically scaled system cannot provide for adequate money.
 
 
Like the transition from a single-threaded paradigm to a multi-threaded one, migration to a distributed system requires a kind of immersion and an understanding of how it works inside, what you need to pay attention to.
 
 
One of the problems that confronts a person who wants to migrate a project to a distributed system or start a project on it is which product to choose.
 
 
We, as a company that has “eaten ...
+ 0 -

Beijing will introduce a social rating for residents of the city in 202?

Beijing will introduce a social rating for residents of the city in 202? 3r399.  

The social rating system on the Black Mirror series: season ? episode 1
3r399.  
3r399.  
Chinese authorities have previously announced r3-3358. plans r3r383. introduce a social rating for all 1.3 billion citizens of the country in 2020. Obviously, these plans are not destined to be realized, and the implementation of the program on a global scale will take much more time. Nevertheless, the cyberpunk system of social ranking of citizens with a computer-based calculation of the value of each citizen for society, depending on his social behavior, is getting closer. 3r399.  
3r399.  
Recently it ...
+ 0 -

How to choose a battery for UPS

 3r3171. 3r3-31. The battery is the most important component of the UPS. And the most expensive. In addition, also having a certain life cycle. So, sooner or later, the question of replacing the battery will certainly arise. 3r3158.  3r3171. Good batteries serve for many years, why not look into tomorrow to understand what will happen to our investments in hardware? It often happens when something good is very expensive, and after a while it becomes easily accessible. 3r3158.  3r3171. When choosing a UPS, the question arises: which battery is used in it, and which one is suitable for replacement? ...
+ 0 -

Models Sequence-to-Sequence Part 1

Models Sequence-to-Sequence Part 1 3r33300. 3r3-31. Good day everyone!
 3r33300.
 3r33300. And we have again opened a new stream for the revised course 3r336. "Data scientist"
: another Excellent teacher , slightly modified based on the program updates. Well, as usual interesting 3r310. open lessons
and selections of interesting materials. Today we will begin the analysis of seq2seq models from Tensor Flow.
 3r33300.
 3r33300. Go.
 3r33300.
 3r33300. As discussed in 3r320. tutorial RNN
(we recommend to read it before reading this article), recurrent neural networks can be taught to simulate a language. And an interesting ...
+ 0 -