Search code examples
postgresqlpartitioningdatabase-partitioning

Dynamic table partitioning by string identifier in Postgres


I would like to partition a table in Postgres by previously unknown value. In my scenario that value would be device_id which is a string.

This is current situation:

Table 'device_data' - stores sensor data which is sent from devices, defined by DDL:

CREATE TABLE warehouse.device_data (
  id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('device_data_id_seq'::regclass),
  device_id TEXT NOT NULL,
  device_data BYTEA NOT NULL,
--   contains additional fields which are omitted for brevity
  received_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);

Table currently holds millions of records and queries are taking huge amount of time. Most of queries contain WHERE device_id='something' clause.

Solution I have in mind is to create table partitions for each device_id.

Is it possible in Postgres to create table partitions for each device_id?

I went through Postgres documentation and couple of examples I found but all of them use fixed boundaries to create partitions. My solution would require:

  1. create new table partition on the fly when new device_id is first encountered
  2. store to an existing partition if the device_id is already known and partition for that device_id already exist

I would like this to be done using table partitions as it would allow querying across multiple device_ids.


Solution

  • I like the idea of dynamic partitioning. I do not know though how it will affect the performance as I have never used it.

    Change the type of id to int default 0 and manually create the sequence to avoid multiple nextval() calls on a single insert:

    create table device_data (
        id int primary key default 0,
        device_id text not null,
        device_data text not null, -- changed for tests
        received_at timestamp without time zone default now()
    );
    create sequence device_data_seq owned by device_data.id;
    

    Use dynamic sql in the trigger function:

    create or replace function before_insert_on_device_data()
    returns trigger language plpgsql as $$
    begin
        execute format(
            $f$
                create table if not exists %I (
                check (device_id = %L)
                ) inherits (device_data)
            $f$, 
            concat('device_data_', new.device_id), 
            new.device_id);
        execute format(
            $f$
                insert into %I
                values (nextval('device_data_seq'), %L, %L, default)
            $f$, 
            concat('device_data_', new.device_id), 
            new.device_id, 
            new.device_data);
        return null;
    end $$;
    
    create trigger before_insert_on_device_data
        before insert on device_data
        for each row execute procedure before_insert_on_device_data();
    

    Test:

    insert into device_data (device_id, device_data) values
        ('first', 'data 1'),
        ('second', 'data 1'),
        ('first', 'data 2'),
        ('second', 'data 2');
    
    select * from device_data_first;
    
     id | device_id | device_data |        received_at         
    ----+-----------+-------------+----------------------------
      1 | first     | data 1      | 2016-10-18 19:50:40.179955
      3 | first     | data 2      | 2016-10-18 19:50:40.179955
    (2 rows)
    
    select * from device_data_second;
    
     id | device_id | device_data |        received_at         
    ----+-----------+-------------+----------------------------
      2 | second    | data 1      | 2016-10-18 19:50:40.179955
      4 | second    | data 2      | 2016-10-18 19:50:40.179955
    (2 rows)