From the loaded MPP DBMS - a vigorous Data Lake with analytical tools: we share the details of creating

All organizations that have anything to do with data, sooner or later, face the issue of storing relational and unstructured databases. It is not easy to find simultaneously convenient, effective and inexpensive approach to this problem. And to make it so that the data can successfully work with date-sentientists with models of machine learning. We did - and although we had to tinker, the final profit was even more than expected. We will describe all the details below.
 
 
From the loaded MPP DBMS - a vigorous Data Lake with analytical tools: we share the details of creating  
Parquet . [/i] For analytical problems, so-called wide tables with many columns ...
+ 0 -

Test performance of several types of drives in a virtual environment

Test performance of several types of drives in a virtual environmentTechnologies of virtualization are in demand today not only in the segment of "big business", but also in SMB and home users. In particular, for small server companies, virtualization can be used to implement a number of not very resource-intensive services. In this case, it usually refers to stand-alone servers based on single- or dual-processor platforms, with a relatively small amount of RAM in 32-64 GB and without special high-performance storage. But for all the benefits, you need to be aware that in terms of performance, virtual systems are different from real ones. In this article, we compare ...
+ 0 -

"Mass product": the first commercial DNA storage will be presented in 2019

Start the service plans startup Catalog. Company develops a special installation , which will allow daily recording of terabytes of data in 500 trillion DNA molecules.
 
 
Next, let's talk about the approach used by the Catalog, and other fresh developments on the DNA field.
 
 
"Mass product": the first commercial DNA storage will be presented in 2019

 
/photo University of Michigan CC
 
 

Project details


 
The classical approach to writing data in DNA involves converting a sequence of bits-zeros and ...[/h]
+ 0 -

Testing of caching technology for RAID arrays Adaptec

Testing of caching technology for RAID arrays AdaptecSolutions for working with RAID arrays from hard drives have been used for a long time. In general, they continue to be popular in many areas, when a relatively inexpensive, fault-tolerant, large-capacity array is required. Given the size of modern hard drives, their speed, as well as other reasons, the greatest practical interest is arrays of RAID6 (or RAID6? if there are many disks). But this type of arrays has a low performance on random writes and it's not easy to do anything with it.
 
 
Of course, in this case we are talking about the speed of the "raw volume". In real life, it is added to ...
+ 0 -

Why E in the abbreviation of EXD is about business processes

Data warehouse without E


 
Today, in any company related to large and medium-sized businesses, the availability of a data warehouse is a de facto corporate standard. No matter what industry the company operates in, without analyzing the available data on customers, suppliers, finances, it is impossible to maintain a competitive advantage. With the development of automation and optimization at every level of production of a product or service, more and more IT systems that create data - production, accounting, planning systems, personnel management, and others - are being used in the organization.
 
 
How to...[/h]
+ 0 -

Seph as a plug-in storage: 5 practical conclusions from a large project

Taking into account the growth of data in our time, it is increasingly spoken about software-defined and distributed data stores, and much attention is traditionally paid to the open platform Seph. Today we want to talk about the conclusions that we came to in the process of implementing the project on data storage for a major Russian agency.
 
.https://github.com/val5244/pg_rbytea ). The essence of the solution was to transfer the data from the specified database to the Seph repository at the same time. The developed module allows one-time migration of data without stopping the database, using ...
+ 0 -

How did we stop spending a week on issuing a dev-stand

Every developer wants his dev-stand. Each tester wants his test stand. And every specialist in preproduction wants his stand - to check everything and rehearse the launch in the final. When all these Waiters converge in processing - one of the largest and most active systems of the bank - the cost of infrastructure is forced to scratch the back of the head and look for "options." We will tell you about what we found in this post.
 
 
How did we stop spending a week on issuing a dev-stand
 
 
The volume of processing databases is about 6 TB. On one copy of the databases, developers interfere with each other, so the actual amount of space occupied by databases ...
+ 0 -

Markdown knowledge base (or blog, or project documentation)

Markdown knowledge base (or blog, or project documentation)I noticed behind myself that I constantly write down all sorts of trivia, useful information, just something from the clipboard right in the text editor. Always somewhere in the background hangs an open Sublime Text with a bunch of tabs.
 
And I noticed that it is more convenient for me to structure the information in one file using the Markdown syntax - it is more pleasant to have the source text, rather than the result displayed on the same githaba.
 
Over time, I noticed that there are a lot of such saved files, and no open tabs are going to be reduced. But one careless move and all the stored ...
+ 0 -

Why did VMware decide to create a platform for the development of corporate blockbuster

At the end of June, VMware published conference programs VMworld 2018 US and VMworld 2018 EU , which will be held in August and November, respectively. One of the key topics of the conferences. is creation of corporate blockrooms on the VMware platform.
 
 
Under the cut, we'll tell you why the company started working with distributed registries.
 
 
Why did VMware decide to create a platform for the development of corporate blockbuster

 
/photo Pxhere PD
 
 
[h2] W...[/h]
+ 0 -

How we saved card processing using Exadata

In the past 10 years, VTB is experiencing a strong growth in computing load. Each year it increased by one and a half times, and the volume of accounting data - in two. Support services tried very hard, but it was not easy to keep up with these rates: query plans were moving away, disk space was running out, application code updates were eating up all the resources. In this post we will tell you how to solve the problem without spending a lot on another IBM System p.
 
 
How we saved card processing using Exadata  
 
In 201? card processing, then VTB2? was located on one of the most powerful servers of that time - IBM System p. This supplemented ...
+ 0 -