Search code examples
node.jspostgresqlsearchfull-text-searchtsvector

UNION ALL is extremely slow in postgres


I am new to postgres and am having trouble speeding this tsvector query up. I already have indexes on all the tables but the query is taking absurdly long. Any help is greatly appreciated as I assume I'm doing something stupid.

This is my query

export const getAllListings = async (req: Request, res: Response, next: NextFunction): Promise<void> => {
  await check('search').trim().isLength({ min: 1, max: 50 }).escape().run(req);

  const errors = validationResult(req);
  if (!errors.isEmpty()) {
    return void res.status(400).json({ errors: errors.array() });
  }
  try {
    const { search } = req.query;

    await logSearch(search);

    let listings;
    if (search) {
      const searchQuery = `
      EXPLAIN ANALYZE
      WITH search_items AS (
        SELECT type, url, location, flag, score, overall_rating FROM countries
        UNION ALL 
        SELECT type, url, location, NULL AS flag, score, overall_rating FROM cities
        UNION ALL 
        SELECT type, url, location, NULL AS flag, score, overall_rating FROM landmarks
        UNION ALL 
        SELECT type, url, location, NULL AS flag, score, overall_rating FROM outdoors
        UNION ALL 
        SELECT type, url, location, NULL AS flag, score, overall_rating FROM restaurants
        UNION ALL 
        SELECT type, url, location, NULL AS flag, score, overall_rating FROM activities
      ),
      ranked_items AS (
        SELECT
          type,
          url,
          location,
          COALESCE(flag, '') AS flag,
          score,
          overall_rating,
          CASE
            WHEN type = 'country' THEN 1
            WHEN type = 'city' THEN 2
            WHEN type = 'landmark' THEN 3
            WHEN type = 'outdoor' THEN 4
            WHEN type = 'restaurant' THEN 5
            WHEN type = 'activity' THEN 6
            ELSE 7
          END AS type_order,
          CASE
            WHEN type = 'country' THEN 20
            WHEN type IN ('city', 'activity') THEN 13
            WHEN type = 'landmark' THEN 14
            WHEN type = 'outdoor' THEN 8
            WHEN type = 'restaurant' THEN 6
            ELSE 1
          END AS type_score,
          ts_rank_cd(to_tsvector('english', location || ' ' || url || ' ' || COALESCE(flag, '') || ' ' || type), 
                     to_tsquery('english', regexp_replace(trim($1), '\s+', ':* & ') || ':*')
          ) AS search_rank
        FROM search_items
        WHERE 
             to_tsvector('english', location || ' ' || url || ' ' || COALESCE(flag, '') || ' ' || type) @@ 
             to_tsquery('english', regexp_replace(trim($1), '\s+', ':* & ') || ':*')
      )
      SELECT DISTINCT ON (type_order, lower(location))
        type,
        url,
        location,
        flag,
        score,
        overall_rating,
        type_score,
        search_rank
      FROM ranked_items
           ORDER BY 
        type_order,
        lower(location),
        (type_score * 100 + score + COALESCE(overall_rating, 0) * 0.8 + search_rank) DESC  
      LIMIT 6;
    `;

      listings = (await pool.query(searchQuery, [search])).rows;
    } else {
      const query = `
      SELECT type, url, location, flag, score
      FROM overall
      WHERE potential = false OR potential IS NULL
      ORDER BY id DESC
      LIMIT 4;
    `;
      listings = (await pool.query(query)).rows;
    }
    console.log(listings)

    return void res.status(200).json({
      message: 'Fetched posts',
      data: listings,
    });
  } catch (error) {
    console.log(error);
  }
};

This is my EXPLAIN ANALYZE result

Limit  (cost=28941.11..28941.56 rows=6 width=236) (actual time=85851.041..85856.894 rows=6 loops=1)
  ->  Unique  (cost=28941.11..28967.78 rows=356 width=236) (actual time=85850.980..85856.825 rows=6 loops=1)
        ->  Sort  (cost=28941.11..28950.00 rows=3556 width=236) (actual time=85850.823..85856.624 rows=6 loops=1)
              Sort Key: (CASE WHEN ((search_items.type)::text = 'country'::text) THEN 1 WHEN ((search_items.type)::text = 'city'::text) THEN 2 WHEN ((search_items.type)::text = 'landmark'::text) THEN 3 WHEN ((search_items.type)::text = 'outdoor'::text) THEN 4 WHEN ((search_items.type)::text = 'restaurant'::text) THEN 5 WHEN ((search_items.type)::text = 'activity'::text) THEN 6 ELSE 7 END), (lower((search_items.location)::text))
              Sort Method: quicksort  Memory: 315kB
              ->  Subquery Scan on search_items  (cost=7.58..28731.38 rows=3556 width=236) (actual time=11.452..85261.766 rows=1851 loops=1)
                    ->  Append  (cost=7.58..27673.47 rows=3556 width=192) (actual time=10.519..82162.075 rows=1851 loops=1)
                          ->  Bitmap Heap Scan on countries  (cost=7.58..13.97 rows=5 width=161) (actual time=10.516..10.625 rows=1 loops=1)
                                Recheck Cond: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || (COALESCE(flag, ''::character varying))::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                                Heap Blocks: exact=1
                                ->  Bitmap Index Scan on idx_countries_fts  (cost=0.00..7.58 rows=5 width=0) (actual time=8.169..8.270 rows=1 loops=1)
                                      Index Cond: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || (COALESCE(flag, ''::character varying))::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                          ->  Subquery Scan on "*SELECT* 2"  (cost=1000.42..27632.06 rows=3547 width=141) (actual time=251.128..81942.093 rows=1848 loops=1)
                                ->  Gather  (cost=1000.42..27587.72 rows=3547 width=113) (actual time=251.121..81851.590 rows=1848 loops=1)
                                      Workers Planned: 2
                                      Workers Launched: 2
                                      ->  Parallel Index Only Scan using idx_cities_search_cols on cities  (cost=0.42..26233.02 rows=1478 width=113) (actual time=172.105..83712.956 rows=616 loops=3)
                                            Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                                            Rows Removed by Filter: 58500
                                            Heap Fetches: 0
                          ->  Subquery Scan on "*SELECT* 3"  (cost=0.00..5.51 rows=1 width=1160) (actual time=30.231..30.312 rows=0 loops=1)
                                ->  Seq Scan on landmarks  (cost=0.00..5.50 rows=1 width=1132) (actual time=30.219..30.219 rows=0 loops=1)
                                      Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                                      Rows Removed by Filter: 9
                          ->  Subquery Scan on "*SELECT* 4"  (cost=0.00..1.29 rows=1 width=1160) (actual time=15.920..15.921 rows=0 loops=1)
                                ->  Seq Scan on outdoors  (cost=0.00..1.28 rows=1 width=1132) (actual time=15.749..15.749 rows=0 loops=1)
                                      Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                                      Rows Removed by Filter: 1
                          ->  Subquery Scan on "*SELECT* 5"  (cost=0.00..1.29 rows=1 width=1160) (actual time=11.147..11.153 rows=1 loops=1)
                                ->  Seq Scan on restaurants  (cost=0.00..1.28 rows=1 width=1132) (actual time=11.063..11.067 rows=1 loops=1)
                                      Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                          ->  Subquery Scan on "*SELECT* 6"  (cost=0.00..1.57 rows=1 width=1160) (actual time=12.723..12.728 rows=1 loops=1)
                                ->  Seq Scan on activities  (cost=0.00..1.56 rows=1 width=1132) (actual time=12.693..12.696 rows=1 loops=1)
                                      Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                                      Rows Removed by Filter: 1
Planning Time: 850.642 ms
Execution Time: ??? ms

I tried creating indexes and changing them up, I tried vacuum analyze on my table.


Solution

  • All the time is spent here:

    ->  Subquery Scan on "*SELECT* 2"  (cost=1000.42..27632.06 rows=3547 width=141) (actual time=251.128..81942.093 rows=1848 loops=1)
          ->  Gather  (cost=1000.42..27587.72 rows=3547 width=113) (actual time=251.121..81851.590 rows=1848 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Parallel Index Only Scan using idx_cities_search_cols on cities  (cost=0.42..26233.02 rows=1478 width=113) (actual time=172.105..83712.956 rows=616 loops=3)
                    Filter: (to_tsvector('english'::regconfig, (((((((location)::text || ' '::text) || (url)::text) || ' '::text) || ''::text) || ' '::text) || (type)::text)) @@ '''cana'':*'::tsquery)
                    Rows Removed by Filter: 58500
                    Heap Fetches: 0
    

    To speed that up you need an index:

    CREATE INDEX ON cities USING gin (
       to_tsvector('english', location || ' ' || url || ' ' || '' || ' ' || type)
    );
    

    Note that that index will only work if flag is an empty string or NULL. For other values, you'd need a different index.