How data exchange affects the quality of recommendations

Hello, Habr!
 
 
We pay special attention to integration testing when connecting a new client to the platform and constantly monitor the status of integration in the process of work. Why is it critical? Because data collection is the basis for the formation of quality recommendations.
 
 
How data exchange affects the quality of recommendations  
 
The work of the advisory system is based on several important components: data collection, storage, processing, issuance of recommendations and growth hacking. Plus, "iron" to ensure the computational power of algorithms and the layout process. Thus, we get at least 7 points on which the quality of recommendations ...
+ 0 -

Splunk Scripted Input. Or how to use scripts to get data about the operation of systems and analyze them in Splunk

Earlier we wrote , how to upload logs to Splunk from the directory or using syslog, told how to take standard Windows and Linux events, but what if we need to receive more granular information about the operation of our systems?
 
In this case, scripts come to the rescue!
 
 
Splunk Scripted Input. Or how to use scripts to get data about the operation of systems and analyze them in Splunk  
 
When, what and how you can use the scripts in Splunk to get data - you can find out under the cut.
 
here and here ). Therefore, we will now discuss this briefly.
 
 
...
+ 0 -

Machine learning by hands c.1. Linear filter for OLS

Machine learning by hands c.1. Linear filter for OLSHello, my friends, my name is Arthur.
 
 
In this series of articles, we will write algorithms for machine learning with our own hands with minimal use of frameworks, except that numpy.
 
 

The data is


 
 
Let's look at a typical sample of Wine Quality Data Set , we will try to predict the quality of white wine by its characteristics and see which feature affects the most. Here are the featured descriptions:
 
 
1 - fixed acidity
 
2 - volatile acidity
 
3-citric acid
 
4 - residual sugar
 
5-chlorides
 
6 - free sulfur dioxide
 
7 - total sulfur dioxide
 
Density of
 
9 - pH
 
10 - sulphates
 
11 -...[/h]
+ 0 -

Data Fest 2018: announcement and registration

Data Fest 2018: announcement and registration
 
 
Friends, we invite you to the fifth Moscow Data Fest, which will be held on April 28 in the territory of the design factory FLACON. Data Fest is the largest free conference for researchers, engineers and developers involved in data analysis and processing, machine learning, and what the press likes to call AI.
 
 
You will learn about AI in Mail.Ru Group products and "smart" answers in Mail.Ru Mail, how recommendations and computer vision work in VKontakte and Classmates, as well as machine translation in Alibaba and what is Quantum Machine Learning, as well as much another!
 
...
+ 0 -

About streams and tables in Kafka and Stream Processing, part 1

About streams and tables in Kafka and Stream Processing, part 1 * Michael G. Noll is an active contributor to Open Source projects, including Apache Kafka and Apache Storm.
 
 
The article will be useful first of all to those who only get acquainted with Apache Kafka and /or stream processing[Stream Processing].

 
 
In this article, perhaps in the first of the mini-series, I want to explain the concepts of Streams [Streams] and Tables [Tables] in stream processing and, in particular, in Apache Kafka . I hope you will have a better theoretical idea and ideas that will help you solve your current and future tasks better and /or faster.
 
 
Contents:
 
 
* Motivation
 
* Streams ...[/Tables][/Streams][/Stream]
+ 0 -

How to upload a non-standard log into Splink + logs Fortinet

Do we generate a lot of data using information systems every day? Great amount! But do we know all the possibilities for working with such data? Definitely not! In the framework of this article, we will describe what types of data we can load for further operational analysis in Splunk, and also show how to connect the download of Fortinet logs and non-standard log files that need to be divided into fields manually.
 
 
How to upload a non-standard log into Splink + logs Fortinet
 

 
Splunk can index data from various sources that can store logs locally on one machine with Splunk-indexer, or on a remote device. To collect data from remote machines, they are assigned ...
+ 0 -

Five myths about Data Science

My name is Ivan Serov, I work in the department of Data Science of the finance company ID Finance. Data scientist is quite a young, but very popular profession, which is overgrown with a lot of myths. In this post, I will tell you about several misconceptions that novice data-sentients (DS) face.
 
 
Five myths about Data Science  
 

DS do not have to know about business


 
A good DS should not only be able to build a good model, but also to understand why this model should be built, and even say that this model is not needed, if it is so. For example, for one of our projects we did a model that would predict the presence of money ...[/h]
+ 0 -

How to solve 90% of NLP tasks: a step-by-step guide to natural language processing

How to solve 90% of NLP tasks: a step-by-step guide to natural language processingNo matter who you are - a proven company, or just going to launch your first service - you can always use text data in order to test your product, improve it and expand its functionality.
 
 
Natural language processing (NLP)
called actively developing scientific discipline, engaged in the search for meaning and training on the basis of textual data.
 
 

How can this article help you


 
Over the past year, the team Insight took part in the work on several hundreds of projects, combining the knowledge and experience of leading companies in the United States. The results of this work they ...[/h]
+ 0 -

A review of the cases of interesting implementation of Big Data in the companies of the financial sector

Case studies of the practical application of Large data
 
in the financial sector companies


 
A review of the cases of interesting implementation of Big Data in the companies of the financial sector Why this article?
 
 
This review examines cases of implementation and application of large data in real life using the example of "live" projects. For some, especially interesting, in all senses, case I dare to give my comments.
 
 
The range of case studies examined is limited to examples presented in public on site of Cloudera .
 
 

What is "Big Data"


 
There are technical jokes in the technical ...[/h][/h]
+ 0 -

Apache Kafka: review

Hello, Habr!
 
 
Today we offer you a comparatively brief, but also sensible and informative article about the device and applications of Apache Kafka. We expect to translate and release book Niya Narkhide (Neha Narkhede) et. al until the end of the summer.
 
 
Apache Kafka: review
 
Enjoy reading!
 
Streams API .
 
 
This API is designed to be used within your own code base, it does not work on the broker. Functionally it is similar to the consumer API, facilitates the horizontal scaling of the processing of flows and its distribution between several applications (similar to consumer groups).
 
 
Processing ...
+ 0 -