Search code examples
postgresqlpostgresql-12

Why does subpartitions not increase insert speed performance in postgres?


I have installed postgres 12 and made a test with partitions (shown below). My question is, why there is no significant performance gain inserting data, even with ¼ billion rows in the table, when comparing 5 partitions vs 5 partitions with 5 subpartitions.

My goal is to import data fast in usually big tables with + ¼ billion rows, and I was of the impression, that having more partitions would reduce the size of the indexes and increase the speed of inserting data.

This was the setup for the test:

Machine: Local pc, 16 GB ram

Postgres version: 12

Partition test: Table a) 5 hash partitions with 5 hash subpartitions. Table b) 5 hash partitions

Tablesetup (example with only 5 partitions)

CREATE TABLE public.only_5_partitions
(
    id integer NOT NULL,
    title character varying COLLATE pg_catalog."default",
    project_id integer
) PARTITION BY HASH (id) ;

--INDEXES ON ALL 3 COLUMNS NOT ADDED IN THE CODE EXAMPLE, BUT THERE ARE INDEXES ON ALL 3.

--THE PARTITIONS

CREATE TABLE public.only_5_partitions_0 PARTITION OF public.only_5_partitions
    FOR VALUES WITH (modulus 5, remainder 0)
    PARTITION BY HASH (id);

CREATE TABLE public.only_5_partitions_1 PARTITION OF public.only_5_partitions
    FOR VALUES WITH (modulus 5, remainder 1)
    PARTITION BY HASH (id);

CREATE TABLE public.only_5_partitions_2 PARTITION OF public.only_5_partitions
    FOR VALUES WITH (modulus 5, remainder 2)
    PARTITION BY HASH (id);

CREATE TABLE public.only_5_partitions_3 PARTITION OF public.only_5_partitions
    FOR VALUES WITH (modulus 5, remainder 3)
    PARTITION BY HASH (id);

CREATE TABLE public.only_5_partitions_4 PARTITION OF public.only_5_partitions
    FOR VALUES WITH (modulus 5, remainder 4)
    PARTITION BY HASH (id);

Rows inserted: Generated rows with this sample code:

INSERT INTO tableb
SELECT generate_series(1,10000000), 'someting new', generate_series(1,10000000);

As you can see from the test, inserting data in table A (only 5 partitions) more or less are the same as for table B. In some runs the small number of partitions even performed better.

In the last insert, I increased the insert to 50 mio rows, to detect a performance change.

enter image description here

enter image description here


Solution

  • I was of the impression, that having more partitions would reduce the size of the indexes and increase the speed of inserting data.

    Why would this be faster? You seem to be hitting all partitions simultaneously. You have smaller indexes, but more of them, with about the same total size. If you were targeting your inserts to one partition at a time, you might get some caching benefit, but you are not doing that (nor is it easy to do that with hash partitioning).