I am new to postgres and am having trouble speeding this tsvector query up. I already have indexes on all the tables but the query is taking absurdly long. Any help is greatly appreciated as I assume I'm doing something stupid.
This is my query
export const getAllListings = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
await check('search').trim().isLength({ min: 1, max: 50 }).escape().run(req);
const errors = validationResult(req);
if (!errors.isEmpty()) {
return void res.status(400).json({ errors: errors.array() });
}
try {
const { search } = req.query;
await logSearch(search);
let listings;
if (search) {
const searchQuery = `
EXPLAIN ANALYZE
WITH search_items AS (
SELECT type, url, location, flag, score, overall_rating FROM countries
UNION ALL
SELECT type, url, location, NULL AS flag, score, overall_rating FROM cities
UNION ALL
SELECT type, url, location, NULL AS flag, score, overall_rating FROM landmarks
UNION ALL
SELECT type, url, location, NULL AS flag, score, overall_rating FROM outdoors
UNION ALL
SELECT type, url, location, NULL AS flag, score, overall_rating FROM restaurants
UNION ALL
SELECT type, url, location, NULL AS flag, score, overall_rating FROM activities
),
ranked_items AS (
SELECT
type,
url,
location,
COALESCE(flag, '') AS flag,
score,
overall_rating,
CASE
WHEN type = 'country' THEN 1
WHEN type = 'city' THEN 2
WHEN type = 'landmark' THEN 3
WHEN type = 'outdoor' THEN 4
WHEN type = 'restaurant' THEN 5
WHEN type = 'activity' THEN 6
ELSE 7
END AS type_order,
CASE
WHEN type = 'country' THEN 20
WHEN type IN ('city', 'activity') THEN 13
WHEN type = 'landmark' THEN 14
WHEN type = 'outdoor' THEN 8
WHEN type = 'restaurant' THEN 6
ELSE 1
END AS type_score,
ts_rank_cd(to_tsvector('english', location || ' ' || url || ' ' || COALESCE(flag, '') || ' ' || type),
to_tsquery('english', regexp_replace(trim($1), '\s+', ':* & ') || ':*')
) AS search_rank
FROM search_items
WHERE
to_tsvector('english', location || ' ' || url || ' ' || COALESCE(flag, '') || ' ' || type) @@
to_tsquery('english', regexp_replace(trim($1), '\s+', ':* & ') || ':*')
)
SELECT DISTINCT ON (type_order, lower(location))
type,
url,
location,
flag,
score,
overall_rating,
type_score,
search_rank
FROM ranked_items
ORDER BY
type_order,
lower(location),
(type_score * 100 + score + COALESCE(overall_rating, 0) * 0.8 + search_rank) DESC
LIMIT 6;
`;
listings = (await pool.query(searchQuery, [search])).rows;
} else {
const query = `
SELECT type, url, location, flag, score
FROM overall
WHERE potential = false OR potential IS NULL
ORDER BY id DESC
LIMIT 4;
`;
listings = (await pool.query(query)).rows;
}
console.log(listings)
return void res.status(200).json({
message: 'Fetched posts',
data: listings,
});
} catch (error) {
console.log(error);
}
};
This is my EXPLAIN ANALYZE result
Limit (cost=28941.11..28941.56 rows=6 width=236) (actual time=85851.041..85856.894 rows=6 loops=1)
-> Unique (cost=28941.11..28967.78 rows=356 width=236) (actual time=85850.980..85856.825 rows=6 loops=1)
-> Sort (cost=28941.11..28950.00 rows=3556 width=236) (actual time=85850.823..85856.624 rows=6 loops=1)
Sort Key: (CASE WHEN ((search_items.type)::text = 'country'::text) THEN 1 WHEN ((search_items.type)::text = 'city'::text) THEN 2 WHEN ((search_items.type)::text = 'landmark'::text) THEN 3 WHEN ((search_items.type)::text = 'outdoor'::text) THEN 4 WHEN ((search_items.type)::text = 'restaurant'::text) THEN 5 WHEN ((search_items.type)::text = 'activity'::text) THEN 6 ELSE 7 END), (lower((search_items.location)::text))
Sort Method: quicksort Memory: 315kB
-> Subquery Scan on search_items (cost=7.58..28731.38 rows=3556 width=236) (actual time=11.452..85261.766 rows=1851 loops=1)
-> Append (cost=7.58..27673.47 rows=3556 width=192) (actual time=10.519..82162.075 rows=1851 loops=1)
-> Bitmap Heap Scan on countries (cost=7.58..13.97 rows=5 width=161) (actual time=10.516..10.625 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || (COALESCE(flag, ''::character varying))::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_countries_fts (cost=0.00..7.58 rows=5 width=0) (actual time=8.169..8.270 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || (COALESCE(flag, ''::character varying))::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
-> Subquery Scan on "*SELECT* 2" (cost=1000.42..27632.06 rows=3547 width=141) (actual time=251.128..81942.093 rows=1848 loops=1)
-> Gather (cost=1000.42..27587.72 rows=3547 width=113) (actual time=251.121..81851.590 rows=1848 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Only Scan using idx_cities_search_cols on cities (cost=0.42..26233.02 rows=1478 width=113) (actual time=172.105..83712.956 rows=616 loops=3)
Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
Rows Removed by Filter: 58500
Heap Fetches: 0
-> Subquery Scan on "*SELECT* 3" (cost=0.00..5.51 rows=1 width=1160) (actual time=30.231..30.312 rows=0 loops=1)
-> Seq Scan on landmarks (cost=0.00..5.50 rows=1 width=1132) (actual time=30.219..30.219 rows=0 loops=1)
Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
Rows Removed by Filter: 9
-> Subquery Scan on "*SELECT* 4" (cost=0.00..1.29 rows=1 width=1160) (actual time=15.920..15.921 rows=0 loops=1)
-> Seq Scan on outdoors (cost=0.00..1.28 rows=1 width=1132) (actual time=15.749..15.749 rows=0 loops=1)
Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
Rows Removed by Filter: 1
-> Subquery Scan on "*SELECT* 5" (cost=0.00..1.29 rows=1 width=1160) (actual time=11.147..11.153 rows=1 loops=1)
-> Seq Scan on restaurants (cost=0.00..1.28 rows=1 width=1132) (actual time=11.063..11.067 rows=1 loops=1)
Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
-> Subquery Scan on "*SELECT* 6" (cost=0.00..1.57 rows=1 width=1160) (actual time=12.723..12.728 rows=1 loops=1)
-> Seq Scan on activities (cost=0.00..1.56 rows=1 width=1132) (actual time=12.693..12.696 rows=1 loops=1)
Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
Rows Removed by Filter: 1
Planning Time: 850.642 ms
Execution Time: ??? ms
I tried creating indexes and changing them up, I tried vacuum analyze on my table.
All the time is spent here:
-> Subquery Scan on "*SELECT* 2" (cost=1000.42..27632.06 rows=3547 width=141) (actual time=251.128..81942.093 rows=1848 loops=1)
-> Gather (cost=1000.42..27587.72 rows=3547 width=113) (actual time=251.121..81851.590 rows=1848 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Only Scan using idx_cities_search_cols on cities (cost=0.42..26233.02 rows=1478 width=113) (actual time=172.105..83712.956 rows=616 loops=3)
Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
Rows Removed by Filter: 58500
Heap Fetches: 0
To speed that up you need an index:
CREATE INDEX ON cities USING gin (
to_tsvector('english', location || ' ' || url || ' ' || '' || ' ' || type)
);
Note that that index will only work if flag
is an empty string or NULL. For other values, you'd need a different index.