Theory and practice of using HBase

Theory and practice of using HBaseGood afternoon! My name is Danil Lipova, our team at Sbertech started using HBase as a data warehouse. In the course of his study, experience accumulated, which he wanted to systematize and describe (we hope that many will be useful). All the experiments below were carried out with versions of HBase ???-cdh??? and ???-cdh???-beta1.
 
 
 
General architecture
 
Write data to HBASE
 
Reading data from HBASE
 
Caching of the data
 
Batch processing of MultiGet /MultiPut data
 
Strategy for breaking tables into regions (spiliting)
 
Fault Tolerance, Compactification and Locality of Data
 
Settings ...
+ 0 -

Comparative analysis of HDFS 3 with HDFS 2

In our company, SberTech (Sberbank Technologies) currently uses HDFS ??? because it has a number of advantages, such as the Hadoop ecosystem, fast work with large amounts of data, it is good at analytics and much more. But in December 201? Apache Software Foundation released a new version of the open framework for the development and execution of distributed programs - Hadoop 3.0.? which includes a number of significant improvements over the previous main release line (hadoop-2.x). One of the most important and interesting updates is the support for redundancy codes (Erasure Coding). Therefore...
+ 0 -

Distributed data warehouse in the concept of Data Lake: installation CDH

We continue to share our experience in organizing the data warehouse, which we began to talk about in previous post . This time we want to talk about how we solved the tasks of installing CDH.
 
 
Distributed data warehouse in the concept of Data Lake: installation CDH  
) Www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html ). In this case, you need to download the Spark 2 CSD file (available on the Version and Packaging Information page - cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html ), Install it on the host with Cloudera ...
+ 0 -

A review of the cases of interesting implementation of Big Data in the companies of the financial sector

Case studies of the practical application of Large data
 
in the financial sector companies


 
A review of the cases of interesting implementation of Big Data in the companies of the financial sector Why this article?
 
 
This review examines cases of implementation and application of large data in real life using the example of "live" projects. For some, especially interesting, in all senses, case I dare to give my comments.
 
 
The range of case studies examined is limited to examples presented in public on site of Cloudera .
 
 

What is "Big Data"


 
There are technical jokes in the technical ...[/h][/h]
+ 0 -