Search code examples
performancepostgresqlquery-optimizationdatabase-performance

optimize an inner join between two multi-million row tables


I'm new to Postgres and even newer to understanding how explain works. I have a query below which is typical, I just replace the date:

explain
select account_id, 
   security_id, 
   market_value_date, 
   sum(market_value) market_value
from market_value_history mvh
inner join holding_cust hc on hc.id = mvh.owning_object_id
where
hc.account_id = 24766
and market_value_date = '2015-07-02'
and mvh.created_by = 'HoldingLoad'
group by account_id, security_id, market_value_date
order by security_id, market_value_date;

Attached is a screenshot of explain enter image description here The count for holding_cust table is 2 million rows and market_value_history table has 163 million rows

Below are the table definitions and indexes for market_value_history and holding_cust:

I'd appreciate any advice you may be able to give me on tuning this query.

CREATE TABLE public.market_value_history
(
  id integer NOT NULL DEFAULT nextval('market_value_id_seq'::regclass),
  market_value numeric(18,6) NOT NULL,
  market_value_date date,
  holding_type character varying(25) NOT NULL,
  owning_object_type character varying(25) NOT NULL,
  owning_object_id integer NOT NULL,
  created_by character varying(50) NOT NULL,
  created_dt timestamp without time zone NOT NULL,
  last_modified_dt timestamp without time zone NOT NULL,
  CONSTRAINT market_value_history_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.market_value_history
  OWNER TO postgres;

-- Index: public.ix_market_value_history_id

-- DROP INDEX public.ix_market_value_history_id;

CREATE INDEX ix_market_value_history_id
  ON public.market_value_history
  USING btree
  (owning_object_type COLLATE pg_catalog."default", owning_object_id);

-- Index: public.ix_market_value_history_object_type_date

-- DROP INDEX public.ix_market_value_history_object_type_date;

CREATE UNIQUE INDEX ix_market_value_history_object_type_date
  ON public.market_value_history
  USING btree
  (owning_object_type COLLATE pg_catalog."default", owning_object_id, holding_type COLLATE pg_catalog."default", market_value_date);




CREATE TABLE public.holding_cust
(
  id integer NOT NULL DEFAULT nextval('holding_cust_id_seq'::regclass),
  account_id integer NOT NULL,
  security_id integer NOT NULL,
  subaccount_type integer,
  trade_date date,
  purchase_date date,
  quantity numeric(18,6),
  net_cost numeric(18,2),
  adjusted_net_cost numeric(18,2),
  open_date date,
  close_date date,
  created_by character varying(50) NOT NULL,
  created_dt timestamp without time zone NOT NULL,
  last_modified_dt timestamp without time zone NOT NULL,
  CONSTRAINT holding_cust_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.holding_cust
  OWNER TO postgres;

-- Index: public.ix_holding_cust_account_id

-- DROP INDEX public.ix_holding_cust_account_id;

CREATE INDEX ix_holding_cust_account_id
  ON public.holding_cust
  USING btree
  (account_id);

-- Index: public.ix_holding_cust_acctid_secid_asofdt

-- DROP INDEX public.ix_holding_cust_acctid_secid_asofdt;

CREATE INDEX ix_holding_cust_acctid_secid_asofdt
  ON public.holding_cust
  USING btree
  (account_id, security_id, trade_date DESC);

-- Index: public.ix_holding_cust_security_id

-- DROP INDEX public.ix_holding_cust_security_id;

CREATE INDEX ix_holding_cust_security_id
  ON public.holding_cust
  USING btree
  (security_id);

-- Index: public.ix_holding_cust_trade_date

-- DROP INDEX public.ix_holding_cust_trade_date;

CREATE INDEX ix_holding_cust_trade_date
  ON public.holding_cust
  USING btree
  (trade_date);

Solution

  • Two things:

    1. As Dmitry pointed out, you should look at creating an Index on market_value_date field. Its possible that post that you have a completely different query plan, which may or may not bring up other bottlenecks, but it should certainly remove this seq-Scan.
    2. Minor (since I doubt if it affects performance), but secondly, if you aren't enforcing field length by design, you may want to change createdby field to TEXT. As can be seen in the query, its trying to cast all createdby fields to TEXT for this query.