From the loaded MPP DBMS - a vigorous Data Lake with analytical tools: we share the details of creating

All organizations that have anything to do with data, sooner or later, face the issue of storing relational and unstructured databases. It is not easy to find simultaneously convenient, effective and inexpensive approach to this problem. And to make it so that the data can successfully work with date-sentientists with models of machine learning. We did - and although we had to tinker, the final profit was even more than expected. We will describe all the details below.
 
 
From the loaded MPP DBMS - a vigorous Data Lake with analytical tools: we share the details of creating  
Parquet . [/i] For analytical problems, so-called wide tables with many columns ...
+ 0 -

Test performance of several types of drives in a virtual environment

Test performance of several types of drives in a virtual environmentTechnologies of virtualization are in demand today not only in the segment of "big business", but also in SMB and home users. In particular, for small server companies, virtualization can be used to implement a number of not very resource-intensive services. In this case, it usually refers to stand-alone servers based on single- or dual-processor platforms, with a relatively small amount of RAM in 32-64 GB and without special high-performance storage. But for all the benefits, you need to be aware that in terms of performance, virtual systems are different from real ones. In this article, we compare ...
+ 0 -

Testing of caching technology for RAID arrays Adaptec

Testing of caching technology for RAID arrays AdaptecSolutions for working with RAID arrays from hard drives have been used for a long time. In general, they continue to be popular in many areas, when a relatively inexpensive, fault-tolerant, large-capacity array is required. Given the size of modern hard drives, their speed, as well as other reasons, the greatest practical interest is arrays of RAID6 (or RAID6? if there are many disks). But this type of arrays has a low performance on random writes and it's not easy to do anything with it.
 
 
Of course, in this case we are talking about the speed of the "raw volume". In real life, it is added to ...
+ 0 -

Seph as a plug-in storage: 5 practical conclusions from a large project

Taking into account the growth of data in our time, it is increasingly spoken about software-defined and distributed data stores, and much attention is traditionally paid to the open platform Seph. Today we want to talk about the conclusions that we came to in the process of implementing the project on data storage for a major Russian agency.
 
.https://github.com/val5244/pg_rbytea ). The essence of the solution was to transfer the data from the specified database to the Seph repository at the same time. The developed module allows one-time migration of data without stopping the database, using ...
+ 0 -

"Eternal leak": how regulators struggle with the sinks of personal data

"Eternal leak": how regulators struggle with the sinks of personal dataThe problem of leaking PD users of social networks and web services is increasingly being discussed in the media. Probably everyone has heard the story with the analytical company Cambridge Analytica, which was able to obtain personal data from 87 million Facebook users (including 3r3r3? data of Mark Zuckerberg himself
).
 
 
However, there are less well-known cases with leaks PD, the scale of the problem is no less. Let's look at a few examples and talk about the measures that regulators and IT companies are taking to prevent such cases.
 
 
...
+ 0 -

Distributed data warehouse in the concept of Data Lake: installation CDH

We continue to share our experience in organizing the data warehouse, which we began to talk about in previous post . This time we want to talk about how we solved the tasks of installing CDH.
 
 
Distributed data warehouse in the concept of Data Lake: installation CDH  
) Www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html ). In this case, you need to download the Spark 2 CSD file (available on the Version and Packaging Information page - cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html ), Install it on the host with Cloudera ...
+ 0 -

Magic on the verge of extreme: fearless 7-ka date of the centers

Magic on the verge of extreme: fearless 7-ka date of the centers  
 
Most recently, a list of applicants for the title of "the most beautiful data center" of the world was published. It was our turn to combine beauty with extreme. That's how ComTec has a list of the 7 most extreme server farms. Whether it's a deep cave at the South Pole or deep waters off the Pacific coast of the United States, the data is safely stored in extreme places around the world.
 
Ice Cube Lab - More than 1200 computing cores and three petabytes of memory, cooling server capacity at an outside temperature of less than -40 - at least extremes. You can not just take and run such a chilling ...
+ 0 -

Pure Storage ActiveCluster in conjunction with VMware: review and testing

Pure Storage ActiveCluster in conjunction with VMware: review and testing  
 
Not so long ago Pure Storage company announced the new ActiveCluster functionality - active /active metro cluster between data stores. This is the technology of synchronous replication, in which a logical volume is stretched between two stores and is available for reading /writing on both. This functionality is available with the new version of the firmware Purity //FA 5 and is absolutely free. Pure Storage also promised that the configuration of the stretched cluster was never so simple and understandable.
 
 
In this article we will tell you about ActiveCluster: what it consists of, how it works and ...
+ 0 -

Overview and testing of Infortrend EonStor DS2024 2nd generation

Overview and testing of Infortrend EonStor DS2024 2nd generation In the article "Infortrend storage is an alternative to A-brands. Review and Testing » we described the functionality and performance of one of the most popular at the time SAN Infortrend - DS 3012T. To the advantages of Infortrend storage, following the results of the last article, we classified the support of classic Enterprise disks (not branded ones) and the operation of SATA disks in 2-controller systems. These facts make the storage system data unique, because using SATA SSD Enterprise, you can build a low-cost storage system with high performance and fault tolerance.
 
 
In this article, we'll look ...
+ 0 -

"Megafon" ordered a storage system "Kupol" for storage of traffic under the law of Yarovoi

"Megafon" ordered a storage system "Kupol" for storage of traffic under the law of Yarovoi  
 
One of the leading manufacturers of equipment for SORM, the company "National Technologies" placed on the public procurement portal the documentation on the data storage system "Dome", which was developed with her participation.
 
 
As it follows from the documentation, until May 3? 2018 the company undertakes to supply the equipment to eight Megafon sites and three Skartel sites (the daughter of Megafon, which operates under the Yota brand). MegaFon sites are located in Moscow, St. Petersburg, Nizhny Novgorod, Samara, Yekaterinburg, Saransk, Pskov and Smolensk, the Yota platform in Moscow, St...
+ 0 -