Test and debug MapReduce

 3r33333. 3r3-31. At Rostelecom, we use Hadoop to store and process data downloaded from multiple sources using java applications. Now we have moved to the new version of hadoop with Kerberos Authentication. When moving, we encountered a number of problems, including the use of the YARN API. Using Hadoop with Kerberos Authentication deserves a separate article, and in this one we’ll talk about debugging Hadoop MapReduce. 3r3306.  3r33333. 3r3306.  3r33333. Test and debug MapReduce 3r3151. 3r3306.  3r33333. When performing tasks in a cluster, launching the debugger is complicated by the fact that we do not know which node will ...
+ 0 -

Apache NiFi: what it is and a brief overview of the features

Today, on thematic foreign sites about Big Data, one can find the mention of such a relatively new for the Hadoop ecosystem tool like Apache NiFi. This is a modern open source ETL tool. Distributed architecture for fast parallel loading and processing of data, a large number of plug-ins for sources and transformations, versioning of configurations is only part of its advantages. For all its power, NiFi remains fairly simple to use.
 
 
Apache NiFi: what it is and a brief overview of the features
 
 
At Rostelecom, we are striving to develop work with Hadoop, so we have already tried and evaluated the advantages of Apache NiFi compared to other solutions. In this article ...
+ 0 -

Theory and practice of using HBase

Theory and practice of using HBaseGood afternoon! My name is Danil Lipova, our team at Sbertech started using HBase as a data warehouse. In the course of his study, experience accumulated, which he wanted to systematize and describe (we hope that many will be useful). All the experiments below were carried out with versions of HBase ???-cdh??? and ???-cdh???-beta1.
 
 
 
General architecture
 
Write data to HBASE
 
Reading data from HBASE
 
Caching of the data
 
Batch processing of MultiGet /MultiPut data
 
Strategy for breaking tables into regions (spiliting)
 
Fault Tolerance, Compactification and Locality of Data
 
Settings ...
+ 0 -

Comparative analysis of HDFS 3 with HDFS 2

In our company, SberTech (Sberbank Technologies) currently uses HDFS ??? because it has a number of advantages, such as the Hadoop ecosystem, fast work with large amounts of data, it is good at analytics and much more. But in December 201? Apache Software Foundation released a new version of the open framework for the development and execution of distributed programs - Hadoop 3.0.? which includes a number of significant improvements over the previous main release line (hadoop-2.x). One of the most important and interesting updates is the support for redundancy codes (Erasure Coding). Therefore...
+ 0 -

Distributed data warehouse in the concept of Data Lake: installation CDH

We continue to share our experience in organizing the data warehouse, which we began to talk about in previous post . This time we want to talk about how we solved the tasks of installing CDH.
 
 
Distributed data warehouse in the concept of Data Lake: installation CDH  
) Www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html ). In this case, you need to download the Spark 2 CSD file (available on the Version and Packaging Information page - cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html ), Install it on the host with Cloudera ...
+ 0 -

A review of the cases of interesting implementation of Big Data in the companies of the financial sector

Case studies of the practical application of Large data
 
in the financial sector companies


 
A review of the cases of interesting implementation of Big Data in the companies of the financial sector Why this article?
 
 
This review examines cases of implementation and application of large data in real life using the example of "live" projects. For some, especially interesting, in all senses, case I dare to give my comments.
 
 
The range of case studies examined is limited to examples presented in public on site of Cloudera .
 
 

What is "Big Data"


 
There are technical jokes in the technical ...[/h][/h]
+ 0 -