Adult journalism: from Russia to the Kremlin

Adult journalism: from Russia to the Kremlin Analysis of publications Lenta.ru for 18 years (from September 1999 to December 2017) using python, sklearn, scipy, XGBoost, pymorphy? nltk, gensim, MongoDB, Keras and TensorFlow. 3r33333. 3r33312.  
3r311. 3r33333. 3r33312.  
The study used data from the post “ Analyze this - Lenta.ru »User ildarchegg . The author has kindly provided 3 gigabytes of articles in a convenient format, and I decided that this is a great opportunity to test some text processing methods. At the same time, if you're lucky, learn something new about Russian journalism, society and in general. 3r33333. ...
+ 0 -

6 typical plots of world literature

+ 0 -

GeoPuzzle - Collect the world in pieces

 3r33333. 3r3-31. GeoPuzzle - Collect the world in pieces 3r33333.  3r33333. I want to talk about a project that has developed over the last couple of years. It is called GeoPuzzle and is a puzzle game on the political map of the world. The goal - to put the pieces of the country into place. The idea is peeped at article "Puzzle Mercator for geography connoisseurs" I also played tetris from countries (still under DOS) as a child, but I don’t remember the name of the program. I was so inspired by the idea that I wanted to make a complete product, interesting not only for schoolchildren, but also for geography connoisseurs. The development ...
+ 0 -

Home dаta: how data analysis is used in architecture and urbanism

 3r3759. 3r3-31. 3r3747. We taught our neurons, XGBos, SVMs and other random forests in GoTo, and then something came up to us - we talk a lot about technology and tell almost nothing about the areas in which they can be applied. 3r33737.  3r3759. We decided to correct this mistake by a series of articles in which we will talk about different areas with unexpectedly large amounts of data, interview analysts and developers, tell you about the projects that we decided to try to do at school, and so on. 3r33748. 3r33737.  3r3759. 3r3747. Home Data: how data analysis is used in architecture and urbanism ...
+ 0 -

Tim Berners-Lee goes on the warpath: "One small step for the Internet "

Tim Berners-Lee goes on the warpath: "One small step for the Internet "
 
 
I have always believed that the Internet is for everyone. That's why I and everyone else are fighting fiercely to protect him. The changes we have achieved have created a better and more connected world. But besides all the good that we have achieved, the network has become the engine of injustice and separation; influenced by powerful forces using it for their own purposes.
 
 
Today I believe that we have reached a critical turning point, and this fundamental change for the better is possible and necessary.
 
 
That is why I have worked with several people at MIT and other places in recent years to ...
+ 0 -

Identification of content profiles in VK

Bots to distinguish from people and the truth is complicated. I myself can not really do it myself. But I came up with a good bicycles
method, how to distinguish in VK "interesting people" from "not very interesting". In terms of network communication, of course, not in life.
 
Identification of content profiles in VK

 
VK put a restriction on the ability to download the contents of the walls of users , and slowly it hurts. Those. It is possible, but it is necessary to greatly refine, optimize and dodge to circumvent the restrictions.
 
 

The basic idea is


 
The main idea is that bots, dull (in the network plan) personality...[/h]
+ 0 -

Game to improve the quality of Wikipedia

Today, a beta version of the online WikiBest game was announced, which is part of the research on data quality in Wikipedia. It is noteworthy that at present the game allows you to compare the quality of data in 5 language versions of Wikipedia: Russian, Ukrainian, Belarusian, Polish, English. In the near future it is planned to expand the number of languages.
 
 
Game to improve the quality of Wikipedia
 
automatic quality assessment articles in this free encyclopedia. However, a large number of problems still remain to be solved. For example, how to automatically evaluate or compare the quality of individual facts in different language versions ...
+ +1 -

Collection of demographic stories in one map

Collection of demographic stories in one map
 
In the recent issue of the magazine
The Lancet
published my article is a curious map and a little explanation for it. I decided to tell about this on Habr, because there is a hope that the implemented way of visualizing the data can be useful to someone else.
 
Kashnitsky , I., & Schöley , J. (2018). Regional population structures at a glance.
The Lancet
, 392 (10143), 209-210. https://doi.org/???/S0140-6736(18)31194-2
Actually, here is a high-resolution map (clickable).
 
...
+ 0 -

We struggle with mistakes and "crutches" in the Unified State Register of Legal Entities - state register of legal entities

We struggle with mistakes and "crutches" in the Unified State Register of Legal Entities - state register of legal entities  
 
Last week we released article about the device USRLE - the state register with data of 10 million companies. That stuff talks about basic things, so it's better to start with it.
 
 
Here we will reveal a rich and fertile topic - the problems of the USRLE, which do not let our developers get bored.
 
Single client ". He puts the data in order: cleans addresses, finds duplicates, corrects typos.
 
 
If you like parsing complex reference books, structuring data and bringing them to a human kind, come to work with us. Now we are looking for a javista for the product "Factor". Salary - from ...
+ 0 -

How is the USRLE - the unified state register of legal entities

How is the USRLE - the unified state register of legal entities  
 
The USRLE is a state register of legal entities in which 10 million Russian companies are kept. Manages the FTS directory.
 
 
From the USRLE we take the data of organizations for " Tips "," Single Customer "And" Factor ". In the article we will tell you how we lived before the directory, how we get access to it and how we work with it.
 
multistat.ru - this is a legal reseller who sold the data of the Federal Tax Service. The problem is that Multistat gave its base with a high price without updates.
 
 
Therefore, we maintained the relevance ...
+ 0 -