Search code examples
postgresqlindexingjsonbmaterialized-views

Using materialized views to improve jsonb indexing and querying


Having an RDS Postgresql 12.5 and working on a table (app_events) with a JSONB column (metadata), the JSON data may vary based on the event name, see the structure below.

CREATE TABLE IF NOT EXISTS "public".app_events (
    id uuid DEFAULT uuid_generate_v4() NOT NULL,
    event_id text NOT NULL,
    name text NOT NULL,
    creation_time timestamp without time zone NOT NULL,
    creation_time_in_milliseconds bigint NOT NULL,
    metadata jsonb NOT NULL,
    PRIMARY KEY(id)
);

There are default indexes (btree) on event_id and name.

Based on the name column we create views to normalize the data from JSON into tabular format, an example is below.

CREATE OR REPLACE VIEW "public".charity_created AS
 SELECT app_events.id,
    app_events.event_id,
    app_events.name,
    app_events.creation_time,
    date(app_events.creation_time) AS created_date,
    (app_events.metadata ->> 'aggregateId'::text) AS user_id,
    (app_events.metadata ->> 'url'::text) AS url,
    (app_events.metadata ->> 'name'::text) AS charity_name,
    (app_events.metadata ->> 'about'::text) AS charity_about,
    (app_events.metadata ->> 'country'::text) AS country,
    (app_events.metadata ->> 'category'::text) AS category,
    (app_events.metadata ->> 'currencyCode'::text) AS currencycode,
    (app_events.metadata ->> 'isPayItForwardPartner'::text) AS is_pay_it_forward_partner,
    (app_events.metadata ->> 'isCampaignDonationPartner'::text) AS is_campaign_donation_partner
   FROM public.app_events
  WHERE (app_events.name = 'CharityCreated'::text)
  ORDER BY (date(app_events.creation_time)) DESC;

And now you can run queries like the below.

SELECT * 
FROM "public".charity_created
WHERE charity_name == 'some_charity_name'

In addition, we create joins or unions between views and started noticing latency on reads/queries up to an hour sometimes, no timeouts but returning data can be challenging, definitely a big hit in our reporting team.

The question and knowledge I am looking for are, where can I create (or should create) indexes to improve the read latency; Debating with two findings so far.

  1. On the actual JSONB column (metadata) doesn't make sense, so maybe a solution is to start creating indexes based on the specific schema per event name like this answer
  2. Create materialized views and pay the price to refresh the copy of data (every night) while creating indexes on the materialized views, similar to this answer
  3. Maybe keep the views and create indexes on these

Solution

  • You can try to use indexes with expressions like:

    CREATE INDEX app_events_charity_name_idx ON app_events USING(metadata ->> 'name'::text);
    

    This index might be used when the expression is used (exactly) in the query, e.g.:

    SELECT * FROM app_events WHERE metadata ->> 'name'::text = 'some charity';
    

    This should work with your view as well. But you should check that with EXPLAIN.

    In newer versions of Postgres there is also the possibility of generated columns. Compared to materialized views, these have the advantage that you do not have to worry about updating them.

    ALTER TABLE app_events ADD charity_name TEXT GENERATED ALWAYS AS (metadata ->> 'name'::text);
    

    You can create indexes for these columns as well.