Comparative analysis of HDFS 3 with HDFS 2
In our company, SberTech (Sberbank Technologies) currently uses HDFS ??? because it has a number of advantages, such as the Hadoop ecosystem, fast work with large amounts of data, it is good at analytics and much more. But in December 201? Apache Software Foundation released a new version of the open framework for the development and execution of distributed programs - Hadoop 3.0.? which includes a number of significant improvements over the previous main release line (hadoop-2.x). One of the most important and interesting updates is the support for redundancy codes (Erasure Coding). Therefore, the task was to compare these versions with each other.
The company SberTech for this research work has been allocated 10 virtual machines of 40 GB in size. Since the encoding policy RS (1?4) requires a minimum of 14 machines, it will not be possible to test it.
One of the machines will have a NameNode beside the DataNode. Testing will be performed with the following encoding policies:
And also, using replication with a replication factor of 3.
The size of the data block was chosen to be 32 MB.
Reed-Solomon codes. Part 1
Reed-Solomon codes. Part 2 - arithmetic of Galois fields
Apache Hadoop ???
It may be interesting