I create the following query for selection data with overlapping periods (for campaigns, which have the same business identifier!):
select
campaign_instance_1.campaign_id,
campaign_instance_1.start_time
from campaign_instance as campaign_instance_1
inner join campaign_instance as campaign_instance_2
on campaign_instance_1.campaign_id = campaign_instance_2.campaign_id
and (
(campaign_instance_1.start_time between campaign_instance_2.start_time and campaign_instance_2.finish_time)
or (campaign_instance_1.finish_time between campaign_instance_2.start_time and campaign_instance_2.finish_time)
or (campaign_instance_1.start_time<campaign_instance_2.start_time and campaign_instance_1.finish_time>campaign_instance_2.finish_time)
or (campaign_instance_1.start_time>campaign_instance_2.start_time and campaign_instance_1.finish_time<campaign_instance_2.finish_time))
With index, created as:
CREATE INDEX IF NOT EXISTS camp_inst_idx_campaign_id_and_finish_time
ON public.campaign_instance_without_index USING btree
(campaign_id ASC NULLS LAST, finish_time DESC NULLS LAST)
TABLESPACE pg_default;
Already on 100 000 rows it runs very slowly - 43 seconds!
For optimization I tried to add index on start_time:
(campaign_id ASC NULLS LAST, finish_time DESC NULLS LAST, start_time DESC NULLS LAST)
But result is the same.
As I understand results of explain analyze, index "start_time" doesn't uses as a Index Condition:
I tried the query with this index either with 10 000 and 100 000 rows - so, as possible, it does not depends on sample size (at least on this scales).
Source table contains the following structure:
campaign_id bigint,
fire_time bigint,
start_time bigint,
finish_time bigint,
recap character varying,
details json
Why my index is not used, and what possible ways to improve the query?
Joining to campaign_instance (itself) doesn't really serve anything here other than making an "existence" check and probably your intention is not to get back duplicates for matching records. Thus you could simplify the query with EXISTS or LATERAL join. Also your join condition on time could be simplified, you seem to be looking for overlapping times:
select campaign_id,start_time
from campaign_instance c1
where exists( select * from campaign_instance c2
where c1.campaign_id = c2.campaign_id
and (c1.start_time <= c2.finish_time and c1.finish_time >= c2.start_time));
That time overlap probably would use < and > instead of <= and >= but I don't know your exact requirements, between is implicitly saying it is <= and >=.
EDIT: Ensure that the match is not the row itself: (This table should have a primary key to make things easier, but as it doesn't, I would assume that there is no duplication on campaign_id, start_time and finish_time and that could be used as a composite key)
select campaign_id,start_time
from campaign_instance c1
where exists( select * from campaign_instance c2
where c1.campaign_id = c2.campaign_id
and (c1.start_time != c2.start_time or c1.finish_time != c2.finish_time)
and (c1.start_time <= c2.finish_time and c1.finish_time >= c2.start_time));
This takes around 230-250 milliseconds on my system (iMac i5 7500, 3.4 Ghz, 64 Gb mem).