How we scanned the entire Internet and that we found out

How many sites do you use daily? A couple of social networks, a search engine, several favorite publishers, about 5 working services. Perhaps, it is unlikely to be typed more than 20 sites.
 
 
How we scanned the entire Internet and that we found out  
 
Have you ever wondered how many sites on the Internet and what happens to them?
 
example.com . 3 million domains do not tie ip to a subdomain of www.
 
 
An important point is the presence of redirects between versions. Because if there are 200 codes in both cases, then for the search engine they are two different sites with duplicated content. I want to remind you, do not forget to set up correct redirects.
 
Redirects w...
+ 0 -

Winter and summer in one color? Search for seasonality in the data

Winter and summer in one color? Search for seasonality in the data  
 
Once we decided to see what seasonal interests there are among users of 2GIS in different cities. Splashes of interest in flowers, New Year's gifts and tires - are quite expected. We decided not to limit ourselves to them and go further, having checked all areas of activity in all 113 cities of presence.
 
 
In this article I will tell you how we searched for seasonality and what features of user behavior they found.
 
 
criterion of inversions .
 
 
If the hypothesis is not rejected, then it is ...
+ 0 -

Constructive elements of a reliable enterprise R application

For those who work with R, it is well known that initially the language was designed as a tool for interactive work. Naturally, the methods convenient for the console-based step-by-step application by a person who is deep in the subject are not suitable for creating an application for the end user. The ability to get detailed diagnostics immediately after the error, to overlook all variables and traces, to execute manually the code elements (possibly, partially changing the variables) - all this will be inaccessible when the R application is autonomous in the enterprise environment. (we say R...
+ 0 -

Vacancy Market Research BA /SA

Vacancy Market Research BA /SA"Research of the vacancy market for analysts" - that was the very real task of one very real leading analyst of one big or small firm. The researcher parsed dozens of job descriptions with hh manually, spreading them over the requested skills and increasing the counter in the corresponding column of the spreadsheet.
 
I saw in this task a good field for automation and decided to try to cope with it less blood, easily and simply.
 
I was interested in the following questions, touched upon in this study:
 
 
average level of salaries of business and system analysts,
 
the most popular skills and ...
+ 0 -

Oh, My Code: Machine learning and analytics in "Classmates"

Oh, My Code: Machine learning and analytics in "Classmates"  
 
What is the difference between Machine Learning and data analysis, who sits in "Odnoklassniki" and how to start my way in machine learning? We talk about this in the twelfth issue of a talk show for programmers.
 
 

 
Video on Channel Techrimrim
 
 
The presenter of the program is the technical director of media projects Pavel Scherbinin, the guest is an engineer-analyst of Odnoklassnikov Dmitry Bugaychenko.
 
Spark , for access to which we use the web front Zeppelin . Basically, the data comes through ...
+ 0 -

Habra-dictionary. Part 1

Friends, good afternoon.
 
Solved the task of compiling the dictionary Habrahabra for the purpose of tracking the emergence of new languages, frameworks, management practices, etc. More shortly new words.
 
The result was a list of English words "in the nominative and singular".
 
I did Windows 10 x64 in the environment, I used the Python 3 language in the Spyder editor in Anaconda 5.1.? I used a wired connection to the network.
 
In this article I get a dictionary of English words on a limited sample. If the topic turns out to be interesting, then in the future I plan to get a dictionary of both ...
+ 0 -

The man is the machine assistant

This blog is usually dedicated to recognizing car numbers. But, working on this task, we came to an interesting solution that can be easily applied for a very wide range of computer vision tasks. About this now and tell you: how to make a recognition system that will not let you down. And if you fail, then you can tell her where the error, retrain and have a slightly more reliable solution than before. Welcome to the cut!
 
 
The man is the machine assistant
 
PowerAI from IBM . It's almost like DIGITS, but you just have to mark out the datasets. Plus, no one optimizes neural networks and solutions. But worked out a lot of cases. The ...
+ 0 -

Features of hiring AI & Data Science specialists

Together with Anna the First
 

Introduction


 
Every day, humanity creates, uses and stores huge amounts of data. Every article, blog post or instagramm, every kid and indeed every fact of communication - data that, when processed, become valuable, brings profit and warns against the risks of the one who owns them and knows how to extract relevant information.
 
With the growth of data analysis capabilities and the recognition of the usefulness of existing archives, the need for experts in Data Science, machine learning and artificial intelligence (AI) is growing, capable of working with data ...[/h]
+ 0 -

Python Selenium and Krisha.kz. The first in Big Data

Python Selenium and Krisha.kz. The first in Big DataForeword
 
To learn something new is always interesting, it captures you completely, at least for me so. This time, too, after studying Python programming, I wondered where it could be applied, except when creating a photo separator (an article about it will be a little later) and a sales accounting program, and came across an article about big data (Big Data) . Having studied the materials on Big Data, I realized that the direction is very promising and it is worth spending time studying it.
 
After that, he began to study an unimaginable number of articles and after watching a couple of dozen ...
+ 0 -

Correction of typos, side view

We will talk about the use of fashionable "Word embedding" not exactly for the purpose - namely to correct typos (strictly speaking, and mistakes too, but we assume that people are literate and sealed). The habr was pretty close article , but there will be a little more about the other.
 
 
Correction of typos, side view  
Visualization of Word2Vec model, received by the student. She studied on the "Lord of the Rings". Clearly something in a black dialect.
 
article , which is well told about this. In order not to repeat, but not to chase the reader through the links ...
+ 0 -