Search code examples
postgresqlmulti-index

PostgreSQL indexed columns choice


I have these two tables :

CREATE TABLE ref_dates(
    date_id SERIAL PRIMARY KEY,
    month int NOT NULL,
    year int NOT NULL,
    month_name CHAR(255)
);


CREATE TABLE util_kpi(
    kpi_id SERIAL PRIMARY KEY,
    kpi_description int NOT NULL,
    kpi_value float,
    date_id int NOT NULL,
    dInsertion timestamp default CURRENT_TIMESTAMP,
    CONSTRAINT fk_ref_kpi FOREIGN KEY (date_id) REFERENCES ref_dates(date_id)
);

Usually, the type of request i'd do is :

  • Selecting kpi_description and kpi_value for a specified month and year:
SELECT kpi_description, kpi_value FROM util_kpi u JOIN ref_dates r ON u.date_id = r.date_id WHERE month=X AND year=XXXX
  • Selecting kpi_description and kpi_value for a specified kpi_description, month and year:
SELECT kpi_description, kpi_value FROM util_kpi u JOIN ref_dates r ON u.date_id = r.date_id WHERE month=X AND year=XXXX AND kpi_description='XXXXXXXXXXX'

I tought about creating these indexes :

CREATE INDEX idx_ref_date_year_month ON ref_dates(year, month);
CREATE INDEX idx_util_kpi_date ON util_kpi(date_id);

First of all, i want to know if it's a good idea to create these indexes.

Second of all and finally, I was wondering if it's a good idea to add kpi_description to the indexes on util_kpi table.

Can you guys give me your opinion ?

Regards


Solution

  • It's not possible to give exact answer without looking on data.

    So it's only possible to give an opinion.

    A. ref_dates

    This table looks very similar to date dimension in ROLAP-schemas.

    So the first what I would do: is change date_id from SERIAL to:

    • DATE datatype
    • or even "smart integer": integer datatype but in form YYYYMMDD. E.g. 20210430. It may look strange but it's not uncommon to see such identificators in date dimensions

    The main point in using such form is that date_id in fact tables became informative even without joining to date dimension.

    B. util_kpi

    I suppose that:

    • ref_dates is a date dimension. So it's ~365 * number of years rows. It could be populated once for 20-30 years for future and it's still will not be really big
    • util_kpi is fact table. Which must be big like "really big" - millions and more records.

    For `util_kpi' I expected id of time dimension but did not found it. So no hourly stats are supposed yet.

    I see util_kpi.dInsertion - which I suppose is planned to be used as time dimension. I would think to extract it into time_id where put hours, minutes and seconds (if milliseconds are not needed).

    C.Indexing

    • ref_dates: it does not matters a lot how you index ref_dates because it's a relatively small table. Maybe unique index on date_id with INCLUDE options for all fields would be the best. Don't create individual index for fields with low selectivity like year or month - it will not make much sense but it will not harm a lot too.
    • util_kpi - you need an index on date_id (as for any foreign keys to other dimension tables that will appear in future).

    That's my thoughts that based on what I supposed.