Search code examples
sqlpostgresqlgreatest-n-per-grouppostgresql-performancedistinct-on

DISTINCT ON slow for 300000 rows


I have a table named assets. Here is the ddl:

create table assets (
    id            bigint                                 primary key,
    name          varchar(255)                           not null,
    value         double precision                       not null,
    business_time timestamp with time zone,
    insert_time   timestamp with time zone default now() not null
);

create index idx_assets_name on assets (name);

I need to extract the newest (based on insert_time) value for each asset name. This is the query that I initially used:

SELECT DISTINCT
    ON (a.name) *
FROM home.assets a
WHERE a.name IN (
                 'USD_RLS',
                 'EUR_RLS',
                 'SEKKEH_RLS',
                 'NIM_SEKKEH_RLS',
                 'ROB_SEKKEH_RLS',
                 'BAHAR_RLS',
                 'GOLD_18_RLS',
                 'GOLD_OUNCE_USD',
                 'SILVER_OUNCE_USD',
                 'PLATINUM_OUNCE_USD',
                 'GOLD_MESGHAL_RLS',
                 'GOLD_24_RLS',
                 'STOCK_IR',
                 'AED_RLS',
                 'GBP_RLS',
                 'CAD_RLS',
                 'CHF_RLS',
                 'TRY_RLS',
                 'AUD_RLS',
                 'JPY_RLS',
                 'CNY_RLS',
                 'RUB_RLS',
                 'BTC_USD'
    )
ORDER BY a.name,
         a.insert_time DESC;

I have around 300,000 rows in the assets table. On my VPS this query takes about 800 ms. this is causing a whole response time of about 1 second for a specific endpoint. This is a bit slow and considering the fact that the assets table is growing fast, this endpoint will be even slower in the near future. I also tried to avoid IN(...) using this query:

SELECT DISTINCT
    ON (a.name) *
FROM home.assets a
ORDER BY a.name,
         a.insert_time DESC;

But I didn't notice a significant difference. Any idea how I could optimize this query?


Solution

  • You may try adding the following index to your table:

    CREATE INDEX idx ON assets (name, insert_time DESC);
    

    If used, Postgres can simply scan this index to find the distinct record having the most recent insert_time for each name.