Partitioning in PostgreSQL 10 and not only

Partitioning in PostgreSQL 10 and not only
 
Many of the achievements of the PostgreSQL 10 version have a registration in the Partitioning section. And this is true: it is obvious that in the transition from 9.6 to 1? a powerful technological leap occurred. In previous versions of the section, they built and managed them, with the means that they already had: the inheritance mechanism with all its limitations and inconveniences.
 
In version 1? we moved on to more specialized mechanisms and more familiar syntax (including for Oracle users, and this has to be taken into account). This jump in the transition from 10 to version 11 was to be supported by important additions that should expand the functionality and improve the performance of operations using partitioning. But from the results of the last komitfest (it ended April 8), it can be seen that not everything that was conceived could be brought to a working state, which means that not everything will fall into version 11.
 
In addition, for the last couple of years the development of the pg_pathman module in Postgres Professional was conducted. Some important opportunities crossed, some remained unique to PostgreSQL and pg_pathman (which works with the vanilla version, that is, PostgreSQL 10 + pg_pathman gives already quite impressive amount of functionality). This will be a separate article. Remarks related to version 11 and to pg_pathman are in italics for convenience.
 

This article is a revised and enlarged fragment of the book Nouveaulités de PostgreSQL 10. (c) Dalibo, translated from the French Igor Levshin ( Original ). The examples from the book are checked, sometimes adapted and localized for greater visibility.

possibility will appear in version 11.
 

In the pg_pathman module, there are very convenient functions that allow not to think about whether the value falls within the allocated range limits. There you can specify the interval of values ​​for the sections, and the module itself will create them as long as necessary to accommodate the entered values. This will work on sections of the form BY RANGE.


 

It should be borne in mind that in PostgreSQL 10 there is no hash partitioning. This serious drawback was corrected in 11: the corresponding patch went through the comfest, so it remains to wait for PostgreSQL 11. If partitioning on the hash is necessary or desirable right now, you can use pg_pathman .


 

Productivity of the insert


 

Table no_parti Unsegmented table. We create it as follows:


 
    CREATE TABLE no_parti (c1 integer, c2 text);
INSERT INTO no_parti SELECT i, 'something'
FROM generate_series (? 9999999) i;
Time: ??? ms (00: ???)

 

Table parti_new We create with the help of new functionality the version of PostgreSQL 10:


 
    CREATE TABLE parti_new (c1 integer, c2 text) PARTITION BY RANGE (c1);
CREATE TABLE parti_new_1 PARTITION OF parti_new FOR VALUES FROM (0) TO (1000000);
CREATE TABLE parti_new_2 PARTITION OF parti_new FOR VALUES FROM (1000000) TO (2000000);
CREATE TABLE parti_new_3 PARTITION OF parti_new FOR VALUES FROM (2000000) TO (3000000);
CREATE TABLE parti_new_4 PARTITION OF parti_new FOR VALUES FROM (3000000) TO (4000000);
CREATE TABLE parti_new_5 PARTITION OF parti_new FOR VALUES FROM (4000000) TO (5000000);
CREATE TABLE parti_new_6 PARTITION OF parti_new FOR VALUES FROM (5000000) TO (6000000);
CREATE TABLE parti_new_7 PARTITION OF parti_new FOR VALUES FROM (6000000) TO (7000000);
CREATE TABLE parti_new_8 PARTITION OF parti_new FOR VALUES FROM (7000000) TO (8000000);
CREATE TABLE parti_new_9 PARTITION OF parti_new FOR VALUES FROM (8000000) TO (9000000);
CREATE TABLE parti_new_0 PARTITION OF parti_new FOR VALUES FROM (9000000) TO (10000000);
INSERT INTO parti_new SELECT i, 'something'
FROM generate_series (? 9999999) i;
Time: ??? ms (00: ???)

 

Table parti_old We create using the old partitioning method:


 
    CREATE TABLE parti_old (c1 integer, c2 text);
CREATE TABLE parti_old_1 (CHECK (c1 BETWEEN 0 AND 1000000)) INHERITS (parti_old);
CREATE TABLE parti_old_2 (CHECK (c1 BETWEEN 1000000 AND 2000000)) INHERITS (parti_old);
CREATE TABLE parti_old_3 (CHECK (c1 BETWEEN 2000000 AND 3000000)) INHERITS (parti_old);
CREATE TABLE parti_old_4 (CHECK (c1 BETWEEN 3000000 AND 4000000)) INHERITS (parti_old);
CREATE TABLE parti_old_5 (CHECK (c1 BETWEEN 4000000 AND 5000000)) INHERITS (parti_old);
CREATE TABLE parti_old_6 (CHECK (c1 BETWEEN 5000000 AND 6000000)) INHERITS (parti_old);
CREATE TABLE parti_old_7 (CHECK (c1 BETWEEN 6000000 AND 7000000)) INHERITS (parti_old);
CREATE TABLE parti_old_8 (CHECK (c1 BETWEEN 7000000 AND 8000000)) INHERITS (parti_old);
CREATE TABLE parti_old_9 (CHECK (c1 BETWEEN 8000000 AND 9000000)) INHERITS (parti_old);
CREATE TABLE parti_old_0 (CHECK (c1 BETWEEN 9000000 AND 10000000)) INHERITS (parti_old);
CREATE OR REPLACE FUNCTION insert_into () RETURNS TRIGGER
LANGUAGE plpgsql
AS $ FUNC $
BEGIN
IF NEW.c1 BETWEEN 0 AND 1000000 THEN
INSERT INTO parti_old_1 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 1000000 AND 2000000 THEN
INSERT INTO parti_old_2 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 2000000 AND 3000000 THEN
INSERT INTO parti_old_3 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 3000000 AND 4000000 THEN
INSERT INTO parti_old_4 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 4000000 AND 5000000 THEN
INSERT INTO parti_old_5 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 5000000 AND 6000000 THEN
INSERT INTO parti_old_6 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 6000000 AND 7000000 THEN
INSERT INTO parti_old_7 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 7000000 AND 8000000 THEN
INSERT INTO parti_old_8 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 8000000 AND 9000000 THEN
INSERT INTO parti_old_9 VALUES (NEW. *);
ELSIF NEW.c1 BETWEEN 9000000 AND 10000000 THEN
INSERT INTO parti_old_0 VALUES (NEW. *);
END IF;
RETURN NULL;
END;
$ FUNC $;
CREATE TRIGGER tr_insert_parti_old BEFORE INSERT ON parti_old
FOR EACH ROW EXECUTE PROCEDURE insert_into ();
INSERT INTO parti_old SELECT i, 'something'
FROM generate_series (? 9999999) i;
Time: ??? ms (02: ???)

 

We see that the speed of insertion with a new approach to partitioning is an order of magnitude higher.


 

In the version 11 hit patch , which significantly increases the insertion rate in the case when the data in the partition key grow monotonically or almost monotonously.


 

As for the performance of reading, at the stage 10 -> 1? there were fundamental changes: the possibility of appeared. effectively exclude from the plan of the section, in which there is certainly no data (partition pruning). This can be do and at the stage of execution, when the condition of getting into this or that section is not known in advance. This happens, for example, in subquery cases.


 

While the exception of sections (reliably) works only in the sectioning of the list. In the case where there are many sections (and in real projects there are thousands, and even tens of thousands), the exclusion of sections can seriously shorten the execution time of the request. In pg_pathman The elimination of unnecessary sections has been implemented almost from the very beginning of the project.


 


 

Comparison of old and new sectioning on examples


 

Let's look at the differences in the approach to partitioning in versions 9.6 and 10.


 

Create a table of temperatures, tied to the place and date. We will create sections for each place and every month.


 

The commands for creating such tables in 9.6:


 
    CREATE TABLE meteo (
t_id serial,
place text NOT NULL,
hour_mesure timestamp DEFAULT now (),
, Temperature real NOT NULL
);
CREATE TABLE meteo_moscow_201709 (
CHECK (place = 'Moscow'
and hour_mesure> = TIMESTAMP '2017-09-???:00:00'
AND hour_mesure < TIMESTAMP '2017-10-01 00:00:00' )
) INHERITS (meteo);
CREATE TABLE meteo_moscow_201710 (
CHECK (place = 'Moscow'
? AND hour_mesure> = TIMESTAMP '2017-10-???:00:00'
AND hour_mesure < TIMESTAMP '2017-11-01 00:00:00' )
) INHERITS (meteo);
CREATE TABLE meteo_sochi_201709 (
CHECK ( place = 'Sochi'
and hour_mesure> = TIMESTAMP '2017-09-???:00:00'
AND hour_mesure < TIMESTAMP '2017-10-01 00:00:00' )
) INHERITS (meteo);
CREATE TABLE meteo_sochi_201710 (
CHECK (place = 'Sochi '
? AND hour_mesure> = TIMESTAMP' 2017-10-???:00:00 '
AND hour_mesure < TIMESTAMP '2017-11-01 00:00:00' )
) INHERITS (meteo);
CREATE TABLE meteo_magadan_201709 (
CHECK (place =' Magadan '
) AND hour_mesur e> = TIMESTAMP '2017-09-???:00:00'
AND hour_mesure < TIMESTAMP '2017-10-01 00:00:00' )
) INHERITS (meteo);
CREATE TABLE meteo_magadan_201710 (
CHECK (place = 'Magadan'
? AND hour_mesure> = TIMESTAMP '2017-10-???:00:00'
and hour_mesure < TIMESTAMP '2017-11-01 00:00:00' )
) INHERITS (meteo);
.
REPLACE FUNCTION meteo_insert_trigger ()
RETURNS TRIGGER AS $$
BEGIN
IF (NEW.place = 'Moscow') THEN
IF (NEW.hour_mesure> = TIMESTAMP '2017-09-???:00:00' AND
NEW.hour_mesure < TIMESTAMP '2017-10-01 00:00:00' ) THEN
INSERT INTO meteo_moscow_201709 VALUES (NEW. *);
ELSIF (NEW.hour_mesure> = TIMESTAMP '2017-10-???:00:00' AND
NEW.hour_mesure < TIMESTAMP '2017-11-01 00:00:00' ) THEN
INSERT INTO meteo_moscow_201710 VALUES (NEW. *);
ELSE
RAISE EXCEPTION 'Date does not fit meteo_insert_trigger (Moscow)';
.END3;
.ElSIF (NEW.place = 'Sochi') THEN
IF (NEW. hour_mesure> = TIMESTAMP '2017-09 -???:00:00 'AND
NEW.hour_mesure < TIMESTAMP '2017-10-01 00:00:00' ) THEN
INSERT INTO meteo_sochi_201709 VALUES (NEW. *);
ELSIF (NEW.hour_mesure> = TIMESTAMP '2017-10-???:00:00' AND
NEW.hour_mesure < TIMESTAMP '2017-11-01 00:00:00' ) THEN
INSERT INTO meteo_sochi_201710 VALUES (NEW. *);
ELSE
RAISE EXCEPTION 'Date does not fit meteo_insert_trigger (Sochi) ';
END IF;
ElSIF (NEW.place =' Magadan ') THEN
IF (NEW.hour_mesure> = TIMESTAMP' 2017-09-???:00:00 'AND
.hour_mesure < TIMESTAMP '2017-10-01 00:00:00' ) THEN
INSERT INTO meteo_magadan_201709 VALUES (NEW. *);
ELSIF (NEW.hour_mesure> = TIMESTAMP '2017-10-???:00:00' AND
NEW.hour_mesure < TIMESTAMP '2017-11-01 00:00:00' ) THEN
INSERT INTO meteo_magadan_201710 VALUES (NEW *
ELSE
RAISE EXCEPTION 'Date does not fit meteo_insert_trigger (Magadan)';
END IF;
ElSE
RAISE EXCEPTION 'Date does not fit meteo_insert_trigger ()!';
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_meteo_trigger
BEFORE INSERT ON meteo
FOR EACH ROW EXECUTE PROCEDURE meteo_insert_trigger ();

 

Commands for creating such tables in 10;


 
    CREATE TABLE meteo (
t_id integer GENERATED BY DEFAULT AS IDENTITY,
place text NOT NULL,
hour_mesure timestamp DEFAULT now (),
, Temperature real NOT NULL
) PARTITION BY RANGE (place, hour_mesure);
CREATE TABLE meteo_moscow_201709 PARTITION of meteo FOR VALUES
FROM ('Moscow', '2017-09-???:00:00') TO ('Moscow', '2017-10-???:00:00');
CREATE TABLE meteo_moscow_201710 PARTITION of meteo FOR VALUES
FROM ('Moscow', '2017-10-???:00:00') TO ('Moscow', '2017-11-???:00:00');
CREATE TABLE meteo_sochi_201709 PARTITION of meteo FOR VALUES
FROM ('Sochi', '2017-09-???:00:00') TO ('Sochi', '2017-10-???:00:00');
CREATE TABLE meteo_sochi_201710 PARTITION of meteo FOR VALUES
FROM ('Sochi', '2017-10-???:00:00') TO ('Sochi', '2017-11-???:00:00');
CREATE TABLE meteo_paris_201709 PARTITION of meteo FOR VALUES
FROM ('Magadan', '2017-09-???:00:00') TO ('Magadan', '2017-10-???:00:00');
CREATE TABLE meteo_paris_201710 PARTITION of meteo FOR VALUES
FROM ('Magadan', '2017-10-???:00:00') TO ('Magadan', '2017-11-???:00:00');

 

Note that the declarative syntax of version 10 is much simpler than the syntax 9.6. In it, as is evident from the example, the most tedious part of the work is the creation of trigger functions.


 

(
?

? pg_pathman ? the developers went even further, there are functions , Which cut existing sections, combine neighboring and many others.) )
 

Here is an example of a function that creates random rows in a table:


 
    CREATE OR REPLACE FUNCTION populate_meteo ()
RETURNS TEXT AS $$
DECLARE
placex text[]: = '{}';
v_place text;
v_hour timest
v_temperature real;
v_nb_insertions integer: = 500000;
v_insertion integer;
BEGIN
placex[0]= 'Moscow';
placex[1]= 'Sochi';
placex[2]= 'Magadan';
FOR v_insertion IN 1 v_nb_insertions LOOP
v_place = placex[floor((random()*3))::int];
v_hour = '2017-09-01' :: timestamp
+ make_interval (days => floor ((random () *)) :: int,
secs => floor ((random () * 86400)) :: int);
v_temperature: = round (((random () * 14)) :: numeric + 1?2);
IF EXTRACT (MONTH FROM v_hour) = 10 THEN
v_temperature: = v_temperature-4;
END IF;
IF EXTRACT (HOUR FROM v_hour) <= 9
OR EXTRACT (HOUR FROM v_hour)> = 20 THEN
v_temperature: = v_temperature-5;
ELSEIF EXTRACT (HOUR FROM v_hour)> = 12
AND EXTRACT (HOUR FROM v_hour) <= 17 THEN
v_temperature: = v_temperature + 5;
END IF;
INSERT INTO meteo (place, hour_mesure, temperature)
VALUES (v_place, v_hour, v_temperature);
END LOOP;
RETURN v_nb_insertions || ' mesures de température insérées';
END;
$$
LANGUAGE plpgsql;

 

Insert the rows into meteo tables from habr_9_6 and habr_10:


 
    habr_9_6 = # EXPLAIN ANALYSE SELECT populate_meteo ();
QUERY PLAN
-------------------------------------------------- ----------------------------------------
Result (cost = ?????? rows = 1 width = 32) (actual time = ?????? rows = 1 loops = 1)
Planning time: ??? ms
Execution time: ??? ms
(3 rows)
habr_10 = # EXPLAIN ANALYSE SELECT populate_meteo ();
QUERY PLAN
-------------------------------------------------- -------------------------------------
Result (cost = ?????? rows = 1 width = 32) (actual time = ?????? rows = 1 loops =
1)
Planning time: ??? ms
Execution time: ??? ms
(3 rows)

 

We see that in version 10 data is inserted twice faster.


 

Management of the entire family of sections at once


 

For operations with sections created using inheritance mechanisms, their explicit enumeration is required.


 
    habr_9_6 = # SELECT 'VACUUM ANALYZE' || relname AS operation
FROM pg_stat_user_tables WHERE relname LIKE 'meteo_%';
operation
------------------------------------
VACUUM ANALYZE meteo_moscow_201709
VACUUM ANALYZE meteo_moscow_201710
VACUUM ANALYZE meteo_sochi_201709
VACUUM ANALYZE meteo_sochi_201710
VACUUM ANALYZE meteo_paris_201709
VACUUM ANALYZE meteo_paris_201710
(6 rows)
habr_9_6 = # gexec
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM

 

In version 10 in order to produce VACUUM and ANALYSE of all sections, it is enough to apply only to the parent table:


 
    habr_10 = # VACUUM ANALYZE meteo;
VACUUM
habr_10 = # SELECT now () AS date, relname, last_vacuum, last_analyze
FROM pg_stat_user_tables WHERE relname LIKE 'meteo_sochi%';
-[RECORD 1 ]+ ------------------------------
date | 2018-04-???: 38: ??? + 03
relname | meteo_sochi_201709
last_vacuum | 2018-04-???: 37: ??? + 03
last_analyze | 2018-04-???: 37: ??? + 03
-[RECORD 2 ]+ ------------------------------
date | 2018-04-???: 38: ??? + 03
relname | meteo_sochi_201710
last_vacuum | 2018-04-???: 37: ??? + 03
last_analyze | 2018-04-???: 37: ??? + 03

 


 

Limitations of version 10


 

Index


 

In version 1? creating an index on a partitioned table is still impossible:


 
    habr_10 = # CREATE INDEX meteo_hour_mesure_idx ON meteo (hour_mesure);
ERROR: can not create index on partitioned table "meteo"

 

It is impossible to create a primary key, a unique restriction, or an exclusion constraint, covering all sections.


 

Because primary keys are not supported in partitioned tables, foreign keys can not be referenced to partitioned tables.


 

At the same time, as in version 9.? you can create indexes on separate sections:


 
    habr_10 = # CREATE INDEX meteo_moscow_201710_hour_idx
ON meteo_moscow_201710 (hour_mesure);
CREATE INDEX

 

In version 1? you can limit the uniqueness of will work (in the case where the constraint is imposed on the column with the partition key). This is a great achievement, and it makes it possible to refer to a partitioned table from the foreign key of another table (FOREIGN KEY)


 

Updating the data


 

In version 10 impossible move the row from one section to another with the command. UPDATE , since the new value of the string will not satisfy the implicit constraint of the original section.


 
    habr_10 = # UPDATE meteo SET place = 'Sochi' WHERE place = 'Moscow';
ERROR: new row for relation "meteo_moscow_201709" violates partition constraint
DETAIL: Failing row contains (? Sochi, 2017-09-???:09:2? ???).

 

In version 1? you can safely update the value of the partition key: line will automatically move in the desired section.


 

Inserting data outside the boundaries of sections


 

With declarative partitioning of version 1? you can declare boundaries:


 
    CREATE TABLE meteo_moscow_ancienne PARTITION of meteo FOR VALUES
FROM ('Moscow', MINVALUE) TO ('Moscow', '2017-09-???:00:00');
CREATE TABLE meteo_sochi_ancienne PARTITION of meteo FOR VALUES
FROM ('Sochi', MINVALUE) TO ('Sochi', '2017-09-???:00:00');
CREATE TABLE meteo_paris_ancienne PARTITION of meteo FOR VALUES
FROM ('Magadan', MINVALUE) TO ('Magadan', '2017-09-???:00:00');

 

Other Limitations


 
  •  
  • The parent table can not contain data  
  • In sections, there can not be additional columns relative to the parent table of columns  
  • Multiple inheritance is not allowed  
  • NULL values ​​in sections are allowed only if they are allowed in the partitioned (parent) table  
  • Sections that do not belong to this database instance are not supported (but you can attach the section as FDW - ) CREATE FOREIGN TABLE PARTITION OF
    ).  
  • In case of joining section
     
    •  
    • it is necessary to check the integrity constraints (leading to the blocking of the entire section)  
    • an identical parent table with a constraint of is required. CHECK .
       
       
       
      These restrictions are also relevant for version 11.
       
      The appearance of the PostgreSQL version does not solve, of course, all the issues related to partitioning. We are still waiting for numerous articles and discussions on this topic.
       
      [postgres-слоны Натальи Лёвшиной]
+ 0 -

Add comment