Educational program for working with punch cards (or the story of how “big data” was processed from 1890 to 1970)

Educational program for working with punch cards (or the story of how “big data” was processed from 1890 to 1970) 3r3182. 3r3-31. 3r33170. In the period 1890-197? all processing of big data was carried out through punch cards. Punch cards, in turn, were processed using so-called. “Recording equipment”, the central element of which was the electromechanical “punch card sorter”. Punch cards and related equipment were used to solve a variety of tasks: population census, accounting, inventory, payroll, etc. 3r3171. 3r3168.  3r3182. 3r33170. How did people work with punch cards? What algorithm did the electromechanical card punch follow? How did you sort by numeric data fields? And on the string? About all this ...
+ 0 -

SAP Data Management Suite as a complex for working with Big Data in companies

This year, at the SAPPHIRE NOW conference, we showed a new set of solutions for working with big data - the SAP HANA Data Management Suite. Previously, many of our materials featured the name SAP Business Suite - today we will try to explain a little what Data Management Suite is and why it appeared at all in the SAP solution portfolio. 3r33130.  
3r33130.  
If earlier, when developing solutions for enterprise resource management, SAP focused on the implementation of standard business processes (this was how the Business Suite was created), now the focus has shifted to data management. This requires ...
+ 0 -

Cassandra Sink for Spark Structured Streaming

A couple of months ago, I started exploring Spark, and at some point I was faced with the problem of saving Structured Streaming calculations in the Cassandra database. 3r33333.  
3r33333.  
In this post, I give a simple example of creating and using Cassandra Sink for Spark Structured Streaming. I hope that the post will be useful to those who have recently started working with Spark Structured Streaming and are wondering how to upload the results of calculations to the database. 3r33333.  
3r33333.  
The idea of ​​the application is very simple - to receive and parse messages from the Kafka...
+ 0 -

Parsim Wikipedia for NLP tasks in 4 teams

3r33333. Parsim Wikipedia for NLP tasks in 4 teams 3r33362. 3r33333. 3r3403.  
3r33333. The essence of
3r33333. 3r3403.  
3r33333. It turns out for this purpose it is enough to run just such a set of commands: 3-33357. 3r3403.  
git clone https://github.com/attardi/wikiextractor.git
cd wikiextractor
wget http://dumps.wikimedia.org/ruwiki/latest/ruwiki-latest-pages-articles.xml.bz2
python3 WikiExtractor.py -o /data/wiki/- no-templates --processes 8 /data/ruwiki-latest-pages-articles.xml.bz2
3r340. 3r3403.  
3r33333. and then polish a little ...
+ 0 -

We invite you to Sberbank Data Science Journey 2018 - the race of machine learning algorithms

This fall, we again organize a large competition for machine learning Sberbank Data Science Journey. Every year we cover a new topic and now invite you to try the forces in AutoML. More specifically, in the development of a very skillful meta-algorithm capable of creating machine learning models independently: with data processing, character building, model training, selection of their parameters, and prediction of the target variable.
 
 
We invite you to Sberbank Data Science Journey 2018 - the race of machine learning algorithms  
 
This year, for the solution of the problem, we will immediately award 13 teams. The rest of the details are in the post.
 
...
+ 0 -

Pancakes with ICO on a python or how to measure people and projects ICO

Friends, good afternoon.
 
There is a clear understanding that most of the ICO projects are essentially an intangible asset. ICO project is not a Mercedes-Benz car - which rides regardless of what its who likes or dislikes. And the main influence on the ICO is the mood of the people - both the mood for the founder of the ICO, and the project itself.
 
It would be good to somehow measure people's attitude towards the founder of the ICO and /or the ICO project. Which was done. The report is below.
 
The result was a tool for collecting positive negative sentiment from the Internet, in particular ...
+ 0 -

If you want to create something really cool, you need to dig deeper and know how your code works in the system, on hardware

Habr, greetings! I wonder how many programmers and developers have discovered data science or data engineering, and are building a successful career in the field of large data. Ilya Markin, Software engineer at Directual , - just one of the developers who switched to data engineering. Talked about the experience in the role of timlida, a favorite tool in data engineering, Ilya talked about conferences and interesting specialized channels of javists, about Directual from the user side and technical, about computer games, etc.
 
 
If you want to create something really cool, you need to dig deeper and know how your code works in the system, on hardware
 
 
-...
+ 0 -

How much do the data for learning the model (not) look like a test sample?

Consider one of the scenarios in which your model of machine learning can be useless.
 
 
There is a saying: "Do not compare apples with oranges" . But what if you want to compare one set of apples with oranges with another, but the distribution of fruit in the two sets is different? Can you work with the data? And how will you do it?
 
 
How much do the data for learning the model (not) look like a test sample? dataset from the competition at Kaggle.
 
 

Step 1: Preparing the data for


 
First of all, we will perform a number of standard steps: clean, fill in blanks, perform label encoding for categorical features. For the given dataset, the step was not ...[/h]
+ 0 -

Why do you need Splunk? Monitoring the operation of IT infrastructure

Why do you need Splunk? Monitoring the operation of IT infrastructure  
 
 
How did the server shutdown affect the health of the infrastructure as a whole?
 
Can you predict the deterioration of the infrastructure?
 
What impact does critical service have on the system?
 
 
In this article we will talk about how Splunk can help in finding answers to these questions.
 
system. Splunk , specializing in the collection of real-time logs, processing and performing complex searches, rapid analysis of data and the creation of dynamic dashboards and alerts.
 
 
In previous articles, we already wrote about how Splunk can be used for ...
+ 0 -

A couple of thoughts about the features of the Russian Data Science

A couple of thoughts about the features of the Russian Data Science  
 
Today at Moscow Data Science Major he talked about privacy, ethical Data Science, and many interesting technical innovations. People listened attentively, asked questions, thanked. But what happened next was very revealing. About this under the cut.
 
 
 
 
And then there was a report about the new Russian developments on NLP with this here's a slide.
 
 
 
 
The only amendment I made to it publishing here is the gray fields that closed the name, surname, quality and address of the living person. A person whose personal data and medical secret were so calmly and casually revealed to a thousand people who are not burdened by any nondisclosure agreements.
 
 
And the most terrible thing is not even that at the same time a whole series of federal laws were violated (No. 323 Article 13 and No. 152 at least). The worst thing, in my opinion, is that almost no one saw anything unexpected or bad in it
 
 
I really want to believe that I'm wrong, and the author changed the ...
+ 0 -