Search code examples
postgresqlrdbmsb-tree

How to enable index-sequential files in postgres


I am writing an application backed by Postgres DB. The application is like a logging system, the main table is like this

create table if not exists logs
(
    user_id   bigint      not null,
    log       bytea       not null,
    timestamp timestamptz not null default clock_timestamp() at time zone 'UTC'
);

One of the main query is to fetch all log about a certain user_id, ordered by timestamp desc. It would be nice that under the hood Postgres DB stores all rows about the same user_id in one page or sequential pages, instead of scattering here and there on the disk.

As I recall from textbooks, is this the so-called "index-sequential files"? How can I guide Postgres to do that?


Solution

  • The simple thing to do is to create a B-tree index to speed up the search:

    CREATE INDEX logs_user_time_idx ON logs (user_id, timestamp);
    

    That would speed up the query, but take extra space on the disk and slow down all INSERT operations on the table (the index has to be maintained). There is no free lunch!

    I assume that you were talking about that when you mentioned "index-sequential files". But perhaps you meant what is called a clustered index or index-organized table, which essentially keeps the table itself in a certain order. That can speed up searches like that even more. However, PostgreSQL does not have that feature.

    The best you can do to make disk access more efficient in PostgreSQL is to run the CLUSTER command, which rewrites the table in index order:

    CLUSTER logs USING logs_user_time_idx;
    

    But be warned:

    • That statement rewrites the whole table, so it could take a long time. During that time, the table is inaccessible.

    • Subsequent INSERTs won't maintain the order in the table, so it “rots” over time, and after a while you will have to CLUSTER the table again.