Search code examples
pythonpostgresqlindexingpsycopg2

Python Postgres order by multiple columns then create new index


I am using psycopg2 to connect to a postgres database.

the database has ~200 million rows and I am trying to create a primary key ordered in order of several columns ie

part_id, date, time
A001   2014-10-23  00:00:00
A001   2014-10-23  00:00:10
...
A002   2014-10-23  00:00:00

I can do the following:

ALTER TABLE data ADD COLUMN ID SERIAL PRIMARY KEY;

But how can I orderby part_id (which though is a varchar is part numeric as above) then date then time first and create the id column to the sorted table?


Solution

  • You could add a primary key on the tree columns:

    alter table data add primary key(part_id, date, time);
    

    However, many experienced SQL developers think that the primary key should only serve to identify rows and anything else. According to this approach, you should use a simple id primary key and create additional indexes if necessary, e.g.

    alter table data 
        add id serial primary key,
        add unique (part_id, date, time);
    

    This may seem suboptimal because it needs two indexes instead of a single one but in practice it is comfortable and avoids some unnecessary complications.

    I strongly advise against attempting to create one integer column that would reflect the order according to other columns.