I have 2 tables both as the same form
table A : content | text
table B : SubContent | text
Table B containing substring Of Texts in Table A
for exampale:
Table a :
Table B :
what i want to achive is :
making south africa | zupta . . for **making south africa** worse than ever.
terkenang arwah ibu | zur eda **terkenang arwah ibu** . . eda terkenang arwah ibu . .
the query I used is very simple
select subcontent, content from A join B on content SIMILAR TO '%'||subcontent||'%'
the query explain :
Gather (cost=1000.00..6611032.19 rows=1678309 width=180)
Workers Planned: 2
-> Nested Loop (cost=0.00..6442201.29 rows=699295 width=180)
" Join Filter: (A.content ~ similar_escape((('%'::text || B.subcontent) || '%'::text), NULL::text))"
-> Parallel Seq Scan on A (cost=0.00..8675.79 rows=186479 width=91)
-> Seq Scan on B (cost=0.00..19.50 rows=750 width=89)
i try use Large Text Indexes follow by this blog (Andrew Kane done a great job! )
https://medium.com/@ankane/large-text-indexes-in-postgres-5d7f1677f89f
i try to add an index on both of the columns that didn't help the execution takes 40 minutes, any ideas?
explain after using Laurenz Albe answer
Nested Loop (cost=27.79..100352.93 rows=1678309 width=180)
-> Seq Scan on q2_distinct (cost=0.00..19.50 rows=750 width=89)
-> Bitmap Heap Scan on clean_distinct (cost=27.79..111.40 rows=2238 width=91)
" Recheck Cond: (A.content ~ similar_escape((('%'::text || q2_distinct.part) || '%'::text), NULL::text))"
-> Bitmap Index Scan on "clean_distinct_Post content_idx" (cost=0.00..27.23 rows=2238 width=0)
" Index Cond: (A.content ~ similar_escape((('%'::text || B.content) || '%'::text), NULL::text))"
Any join that does not have a join condition with an equality operator (=
) can only use a nested loop join.
The only index that can potentially help here is a trigram index on A
:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON "A" USING gin (content gin_trgm_ops);
But you shouldn't expect to get good execution times with a query like that.