Database as Code. Digging deeper

3r33737. Database as Code. Digging deeper 3r33714. 3r33737.  
3r33737. In IT projects, code is written by everyone. Engineers manage Kubernetes clusters with a few lines, disperse clouds with Terraform and roll tons of configurations on Ansible, Chef and Puppet. QA write understandable business test scenarios on Spock and Cucumber. Analysts are fluent, often better than developers, talking in SQL. Project documentation in Markdown, AsciiDoc or LaTEX formats is "compiled" into the required format on the build server. Well, the developers themselves, these code tamers, immediately speak a lot of languages ​​for every life situation - client, server, script, functional, etc., 3r-3714. 3r33737.  
3r33737. The code has long ceased to be mysterious gibberish and is now in one form or another accessible and understandable to many, 3r314. even to prime ministers 3–3–3668. . And all this code is involved in the standard life cycle - is under the control of VCS, subjected to code review, automated testing, CI, CD. Common tools and approaches, performance and quality metrics are used. And all together it has a proud name - "Everything as code". 3r33714. 3r33737.  
3r33737. However, the world of database traditionally stands apart from this extravaganza of progress and technology. The process of developing and maintaining a database does not change over the years and continues to instill horror and fear in developers, administrators and users around the world. But is it possible to present the database in the form of ordinary code? Get closer to the basic development process, use common tools and approaches? About this under the cut. 3r33714. has long appeared and is being successfully used. Polyglot Persistence , as a result, to meet these days on any project one single DBMS will be a great success. Instead, for each specific task, the most suitable base is selected, without trying to somehow get out and solve all their problems in the good old relational. And, for example, the microservice architecture allows you to isolate work from a specific DBMS within one service and squeeze the maximum benefit out of it, without harming the cleanliness of the architecture of the entire project. As a result, “
” (Where to go without it) + ”
Document-oriented DB
3r3692. "+" 3r3-3691. 3r3-371818. Search engine DB 3r3r. DB
SUCCESS! 3r3373719. 3r3692. ". In addition, the number of instances of these databases — sharding, replication, distributed DBMS — is increasing. Plus, all this needs to be “stretched” and maintained in different environments - development, test, food, etc.
3r33737. In order not to be unsubstantiated, I will cite a few excerpts from a fresh opinion poll from a giant company on management solutions and the development of the Quest Software database - DBA Problems: Trends in DB Administration 3r36668. (to download, you have to go through a simple registration, but it's worth it):
Question number 2: "3-33691. How many instances of relational databases are running in your company? 3-33-3692." - based on the results of 24% of respondents use from 100 to 500 instances, and 19% more than 500. 3r3708.  
Question number 4: "3r369691. How many different DBMS (SQL and NoSQL) is used in your company?
" - 56% use from 2 to 3 different DBMS, 40% more than ? and only 4% The respondents are the only one DBMS.
Question number 18: "3-33691. What happens to the number of database instances in your projects? 3-???." - 2/3 of respondents mark their annual increase.
Question number 30 (as a result): "3r3691. What are the main problems the DBA will face in the next 3 years? 3r?692." 3r?337.  
51% - "3r33718. Increasing the number of instances of the DBMS
40% - "3r33718. Ability to administer new non-relational databases
32% - “3-333718. Reduction of the IT budget” (which, as expected, swells due to the first two problems) 3r371919.
3r337. 3r33737.  
3r3122. Are we ready for this? 3r33635. 3r33737.  
3r33737. It is no secret that in order to work effectively with a particular DBMS, specialized tools are needed, such as IDE and DB managers, monitoring tools, modeling tools, migration schemes, and much more. However, most of these tools and environments were invented and developed back in those glorious times when a single relational DBMS was enough for any project with a head, and such buzz words as "3r-3718. Agile 3r-3719.", "3r-3718. DevOps 3r-3r-3719." and "3r3188. CI /CD
", not yet invented. And since then, little has changed, because the area of ​​development and maintenance of the database has always been fairly closed and conservative, and most developers associate with something ancient, complex and incomprehensible. Today, when practically the most disparate DBMS and dozens /hundreds of their instances work in almost any project management, the usual everyday problems of developers and DBA become even more acute. 3r33714. 3r33737.  
3r33737. Traditionally, the most popular and indispensable tool for working with databases are all kinds of IDE. Usually this is a complex solution that combines a variety of functions for developing and administering into a single working environment under its interface, mainly graphical, making our work with the database more productive and comfortable - all that remains is to click on the necessary menu item. Indeed, few of us do without such cool things as 3r3r1313. EMS SQL Manager 3r3668. ,
, 3r3141. dbForge
and 3r3143. many others 3r3668. . 3r33714. 3r33737.  
3r33737. One of the main problems of such "boxes" is that under the pressure of new features and new DBMS (which do not cease to appear on the market) they turn out to be quite complex and continue to become more complex each time. And both in use (often people have to learn not the features of a particular DBMS, but the tricks and features of the IDE itself), and in their implementation (therefore, such systems are usually closed and very paid). Even by many (including me) beloved DBeaver , the flagship open source DB IDE, provides support for NoSQL solutions (and which project is now without them) 3r3151. Only in its Enterprise version 3r3668. . In addition to all this, there is a non-illusory risk that the support of any very necessary and interesting functionality of the DBMS can not be expected soon, until the IDE developers consider it necessary (or they may not consider it at all). 3r33714. 3r33737.  
3r33737. In response to the current situation, users themselves often do not wait until the necessary feature or support of a new DBMS appears, but they solve incoming problems on their own and in the most appropriate way for themselves and their project. GitHub just teems with such solutions of varying degrees of complexity. These are mainly sql-scripts for all occasions ( Dataegret /pg-utils , NikolayS /postgres_dba 3r3r???.????
1. .3r3165. Lestatkim /opensql 3r36668.). Or console utilities, such as top-like
and 3r3169. pgcenter
. As well as all sorts of web-based tools, ranging from completely independent clients and monitoring tools (3r3171. Pg.ukh pg_stats_viewer
). As a result, a whole class of solutions of the form "programmers-for-programmers", "DBA-for-DBA" is formed, with the help of which simple, and not so much, DBMS users share their experiences with each other directly. 3r33714. 3r33737.  
3r3182. Everything as code
3r33737. A similar picture was once observed long ago in the field of server administration and software configuration management. Then, with the constant emergence of new requirements and technologies, the infrastructure of projects became noticeably more complicated, the number of servers and software running on it sharply increased. And there was also a need for more frequent, and then continuous, delivery of software with new business features to the customer. Existing tools for configuring and managing all these facilities did not cope well, resulting in the “Infrastructure as Code” approach and such tools as Ansible, Chef, Salt, Puppet, etc., where infrastructure configuration is performed on a special DSL. What gives more flexibility and freedom of creativity. And the code on such DSL participates in the standard life cycle along with the main application code (in Java, C #, Ruby, or any other programming language), namely, it is stored in the version control system (with all the attendant problems, fork, code review), going to builds on the build server, running automated tests, etc.
3r33737. In the future, the impact of this approach has become noticeable in other areas: 3r-33714. 3r33737.  

Test as Code
” - 3r3-33199. Spock
, Cucumber 3r3668. , 3r3203. Gatling , 3r3205. Postman 3r3668.

Pipeline as Code
” - 3r33232. Jenkins pipeline
Documentation as code
" - Markdown, AsciiDoc, LaTeX
Diagram as Code
" - PlantUML 3r3668. ,
Graphviz 3r3668. ,
, flowchart.js
DNS as code
" - DNSControl , OctoDNS
Presentation as code
" - reveal.js
3r337. 3r33737.  
3r33737. More and more people are no longer afraid to write code, this is becoming the norm. This allows a more flexible configuration, and also allows the use of common tools (version control, metrics, visualization, reports, testing, etc.) in the general flow. In many specialized tools, the graphical interface is replaced or supplemented by a more flexible software interface, various DSL and configuration files. 3r33714. 3r33737.  
3r33737. More detailed information on this topic can be found in the report of Alexander Tarasov (3r-3258. Aatarasoff 3r-?668.) 3r-?360. "Everything as a Code"
. 3r33714. 3r33737.  
Database as code 3r33535. 3r33737.  
3r33737. And what about this in the endless world of databases? When I typed in Google a simple phrase "3r33270. Database as code
", I did not find anything interesting except a single (but what!) Post on DZone - "3r-33272. Database as Code: a Novel Concept 3r3668.". 3r33714. 3r33737.  
3r33737. We talk a lot about the database-first approach. We talk a lot about the fact that data is the most valuable asset of an enterprise. But what about the concept presented by Dan North 3r3668. on the third slide, his
presentations 3r3668. ? What if you treat a database like code? 3r33720. 3r33737. It sounds tempting, and on slides # 5 and # 6 (unfortunately, I did not find the video) 2 approaches to managing the database schema "through code" are described - through incremental migrations (Liquibase, Flyway) and idempotent DDL scripts (3r-3284. Redgate
.). Thus, the scheme is also a code, it is under the control of VCS, built on a build server, automatic tests are performed and so on. 3r33714. 3r33737.  
3r33737. What do we need to change our thinking in the field of work with databases? We need to stop treating our database as a magic artifact or as a unique scenario and look at it from the same angle as we look at the usual code of our application. 3r33720. 3r33737. Powerfully said again, I already have tears in my eyes. Unfortunately, both in the report and in the post itself, “as a code”, 3r3-3691 are considered. 3r33737. only changes to the schema of the database
3r3693. (migration of DB schemas):
3r33737. We treat the source code of our application as a treasure. And this is absolutely true, the code is the heart of any application. But should not
3r33737. database changes
3r3693. also be under the control of a version control system, be automated, be ready for release on demand and obey the “DevOps laws”, like the main code? 3r33720. 3r33737. But just a minute
3r3309. DB is not only a schema! 3r33635. 3r33737.  
3r33737. Most modern DBMSs provide their own query language, with which we can not only receive and modify the data stored in it (the so-called DML) and operate on its schema (the so-called DDL), but generally get it (I'm not afraid of this word) 3r33737. any 3r33737. meta information about the current database and its state (from system tables and views) and perform almost any operations on it (start the database, stop, convert to read only, collect statistics, manage memory, etc.). And such code is also quite suitable for the concept described in the previous paragraph. 3r33714. 3r33737.  
3r33737. And when it comes to the database and query language, the first thing that comes to mind is, of course, 3r-3718. old
and kind SQL. But can we count on it (and at the same time on relational databases, with which the majority is closely associated) now and in the near future, given the huge growth in demand for NoSQL solutions? 3r33714. 3r33737.  
"After all, is SQL already everything? " 3r33635. 3r33737.  
3r33737. You can rightly ask me what kind of SQL I am talking about here in the era of NoSQL and schemaless databases when "3r-3718. Dusty relationals live their life only in the bloody legacy of Interpraise-3r-3719.". Early Habr (as a litmus test of trends in the world of technology) was also once filled with a negative attitude towards SQL and relationals and foreshadowed their early death (the number of votes in brackets and the number of comments through a slash): r3r3714. 3r33737.  
2009 - 3r33338. "MongoDB or how to stop loving SQL"
/ 99
2010 - 3r33347. “NoSQL collection point” 3r3668. (
/ 45
2011 - “Are relational databases doomed?” 3r3668. (
/ 131
2012 -
“NoSQL database: understand the essence” 3r3668. (
/ 75
3r337. 3r33737.  
3r33737. However, from about the 12th year, the mood on Habré changes noticeably:
2012 - 3r33383. "Think 1000 times before using noSQL"
/ 130
2013 - 3r33392. “NoSQL and Big Data - deception of workers?” 3r3668. (
/ 286
2014 - 3r3013. “Why you should never use MongoDB” (+143/245) 3r3668. (
/ 245
2017 - 3r33410. “Why SQL prevails over NoSQL, and where it will lead in the future” 3r36668. (
/ 105
2018 - "No, you do not need machine learning. You need SQL" (
/ 417
3r337. 3r33737.  
3r33737. At the same time, anti-sql and anti-relational attitudes continue to occur, but the community’s response to them is completely different: 3r3714. 3r33737.  
2014 - 3r3437. “Disadvantages of RDBMS or RDBMS vs NoSQL”
/ 79
2015 - “Isn't it time for a relational database to be dumped?” 3r3668. (
/ 309
3r337. 3r33737.  
3r33737. Confirmation of these sentiments can be found in the recent reports of Konstantin Osipov (3r3-3458. Kostja 3r?6688.) 3r3-33460. "NewSQL: SQL is not going anywhere" 3r3668. and Andrei Nikolaenko
"Non-relational SQL" 3r3r6668. . I take the responsibility to bring a brief summary of both presentations: 3r33714. 3r33737.  
" - SQL (as it is not strange) feels pretty comfortable in 3r36969. NoSQL
- medium In almost all NoSQL databases, it is possible to use some sql-like query language, such as
CQL 3r3668. in Cassandra and ScyllaDB,
AQL 3r3668. in Aerospike and
in Couchbase, and in some, full-fledged (or as close as possible to it) ANSI SQL - as in 3r33480. Tarantool
, 3r3482. ClickHouse
and 3r3484. CrateDB
SQL in the clouds
" - Cloud giants provide support for a variety of SQL repositories: 3r3491. Amazon RDS 3r3668. , Amazon Aurora , 3r3495. Oracle Cloud 3r3668. , Google Cloud SQL , 3r3499. Microsoft SQL Azure
, 3r3015. Alibaba Cloud ApsaraDB
, 3r3503. Yandex Managed Databases
SQL to BigData
" - SQL has long been tightly integrated into the Hadoop infrastructure. Back in 200? 3–3–?510 appeared. Hive 3r3668. with its HiveQL, and then, year after year, solutions began to appear that already directly support SQL - 3r3125. Impala 3r3668. (2011) 3r33514. Spark 3r3668. (2013), 3r33516. Kudu
(2014), 3r33518. Presto
(2015), 3r33520. Phoenix
SQL in Potochniki
" - SQL has become almost the standard for stream data handlers: 3r33527. Flink 3r3668. , 3r33529. Samza
, 3r? 3531. Storm
, 3r33333. Apex 3r3668. , and more recently 3r33535. Kafka
SQL in NewSQL 3r3692." - Many modern DBMSs are actively developing towards 3r33718. NewSQL
where they combine the advantages of both NoSQL and classical relational solutions, including extensive SQL support (3-33544. CockroachDB 3r36668., 3-33546. FoundationDB 3r3668., 3r33548. MemSQL 3r3668.).
3r337. 3r33737.  
3r33555. 3r33556. Need more arguments with pictures! 3r33557. 3r33558. 3r33737. I will give some more arguments further, but if everything is clear, then we can safely move on to the next paragraph. 3r33737.  
3r3566. 3r3668. 3r3566. Database history in "No notations":
3r33570. 3r33737.  
Also recently, there are many new and trendy DBMSs that are “add-ons” over well-known and proven relational DBMSs. For example Postgres-based ( Timescale And ToroDB ), MySQL-based ( RadonDB ), And even on the basis of sqllite (which has always positioned itself as a simple and reliable embedded-base) appeared rqlite - “Easy Distributed Relational Storage”. Or they implement the interface of a popular database (for example, MySQL for 3r?383. InfiniDB 3r?6668. And 3r?3855. TiDB 3r?6668.). This approach is good because with the new data model we remain on the same familiar platform that we know how to configure and administer. 3r33737.  
Well, for the very "uncirculated" guys, there are also attempts to create a common universal language "a la SQL", such as Eclipse JNoSQL or rather, its subproject 3r3-3593. JNoSQL Aphrodite
. 3r33737.  
Google, which is one of the pioneers of NoSQL, is also making more and more emphasis on SQL. At first it was 3r3- 3599. Spanner
, and now r3r3601 is actively developing. BigQuery
SQL-interfaces not only stores and data handlers. For example, using r3r3610. osquery
(from Facebook itself) and fsql in SQL, you can get a lot of useful information from any OS or perform various operations with file systems. 3r33737.  
3r3618. 3r33720.
3r33737. In general, SQL is quite comfortable in modern conditions, including far beyond the limits of relationals. With virtually any data source, you can "talk" in SQL or SQL-like language, get the necessary information about data or storage metadata and perform any operations (for example, create something, delete, start, stop, etc.) 3r31414. 3r33737.  
3r33737. 3r33737. Hereinafter (for convenience and with your permission), by “SQL” I will mean not only the “same” standard relational SQL, but also all its subspecies (including far from relational ones), and indeed any embedded in DB QL, with which you can do something useful. 3r33737. 3r33714. 3r33737.  
3r3634. Why all this? 3r33635. 3r33737.  
3r33737. And besides, in the distant 92nd year, a wonderful 3r36399 appeared. ANSI standard
, according to which any relational DBMS must be able to describe its internal structure in a special scheme - 3r3641. Information schema 3r3r6666. . So using the standard query language for service tables /views, you can get the metadata of any database - as it contains charts, tables, indexes, columns, etc. But in fact it is not limited to the same data scheme in the same way (even if outside the standard) you can get information about processes, sessions, query execution plans, disk subsystem, memory utilization and much more. 3r33714. 3r33737.  
3r33737. Despite the fact that this standard appeared almost a quarter of a century ago and only for relational databases, almost all modern NoSQL and NewSQL databases also implement something similar. For example, in Cassandra there is immediately several system circuits (or rather keyspaces - system, system_auth, system_schema, system_traces), which can be accessed using the already mentioned CQL. ClickHouse has special scheme "system" . The developers at CockroachDB generally swung at 3r33651. implementation of standard information_schema
. And even documentary Mongo pleases us system collections 3r3668. . 3r33714. 3r33737.  
3r33737. At the same time there is a constant increase in the number and expansion of such system representations. For example, in Postgresql (as a database with one of the most active communities), in addition to implementing the information_schema itself, the pg_catalog native scheme and pg_stat representations have several official extensions in the form of presentations 3r33659. pg_stat_statements
and 3r3661. pg_buffercache
. As well as additional third-party views, such as 3r3663. pg_active_session_history
, 3r36565. pg_store_plans
and 3r3667. pg_stat_kcache
. 3r33720. 3r33737. Almost any node corresponds to any system table that displays its structure and /or state. That is, it can tell about its internal structure and the state of the database itself through its query language. In addition, many DBMS provides the ability to not only get useful metadata using their query language, but also to perform maintenance operations — stop, start, clear, collect statistics, etc. Some of the DBMSs can use the "ALTER" command not only for tables or columns, but also to other objects, for example, to a session ("
alter session set sql_trace = true
"), data files ("
alter tablespace add datafile
") or the entire system ("
alter system kill session
"). 3r33714. 3r33737.  
3r33737. Knowledge of these tables and queries will help both in the development and administration, and in mastering the next new DBMS. It is clear that in some bases such opportunities are very developed, and the “code” can do almost everything, while in others it is much more modest, but there is definitely a tendency that more and more DBMS developers will develop such capabilities in their products, and we can to operate them effectively. 3r33714. 3r33737.  
Instead of conclusions and summaries
3r33737. Unfortunately, SQL has long been firmly perceived by many as a "low-level" language, a kind of "bytecode" for the database. Which is not accepted, and in some societies it is even deeply indecent to "write with your hands" and in general to contact it in any way. We have long been accustomed to that numerous DB tools and frameworks themselves generate and execute for us "3-333718. Tricky and complicated 3r33737." SQL-queries to the system, and this process can only be observed. But let's get out of the comfort zone of warm lamp-based graphical interfaces and code generators and look at our DBMS from the point of view of the usual 3r3691. code
. Moreover, we have everything necessary for this. 3r33714. 3r33737.  
3r33737. Now in the case. Users of any DBMS have the ability to describe almost all aspects of working from their database in the form of a code, in a simple, understandable and familiar to many language. This may not be ANSI SQL, but any SQL-like dialect, or even its own built-in QL or API, depending on the database used. How is this useful? 3r33714. 3r33737.  
Without being tied to any tool, you can get any information about the database in the required form, or perform any operation on the database;
Since it's still the usual
code 3r33737. , then you need to treat it like any other "real" code. Namely, it should be clear, structured and formatted according to an accepted standard, as well as participate in all stages of the standard code life cycle - be under the control of VCS, undergo Code review, participate in CI /CD pipelines, etc. (and not " lying "somewhere in a folder called" SQL-tricks "). That will increase its quality and value in the team.
3r337. 3r33737.  
3r33737. I am interested in your opinion on this subject, write in the comments - we will discuss. 3r33714. 3r33737.  
3r33737. 3r33737. In the next post, I plan to illustrate the ideas described above with specific examples, and also tell you how we at CROC will try to apply them in an experimental open-source graphical DB-manager that is being developed as part of a research project. 3r33737. 3r33720.
3r33724. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//"""_mediator") () (); 3r33737.
+ 0 -

Add comment