Search code examples
sqlpostgresqlfull-text-search

Can a View of multiple tables be used for Full-Text-Search?


I'm sorry to ask such a noob question, but the postgres documentation on views is sparse, and I had trouble finding a good answer.

I'm trying to implement Full-Text-Search on Postgres for three tables. Specifically, the user's search query would return matching 1) other usernames, 2) message, 3) topics.

I'm concerned that using a view for this might not scale well as it combines three tables into one. Is this a legitimate concern? If not, how else might I approach this?


Solution

  • What you request can be done. To have a practical example (with just two tables), you could have:

    CREATE TABLE users
    (
        user_id SERIAL PRIMARY KEY,
        username text
    ) ;
    
    -- Index to find usernames
    CREATE INDEX idx_users_username_full_text 
        ON users 
        USING GIN (to_tsvector('english', username)) ;        
    
    CREATE TABLE topics
    (
        topic_id SERIAL PRIMARY KEY,
        topic text
    ) ;
    
    -- Index to find topics
    CREATE INDEX idx_topics_topic_full_text 
        ON topics 
        USING GIN (to_tsvector('english', topic)) ;
    

    See PostgreSQL docs. on Controlling Text Search for an explanation of to_tsvector.

    ... populate the tables

    INSERT INTO users
       (username)
    VALUES
       ('Alice Cooper'),
       ('Boo Geldorf'),
       ('Carol Burnet'),
       ('Daniel Dafoe') ;
    
    INSERT INTO topics
       (topic)
    VALUES
       ('Full text search'),
       ('Fear of void'),
       ('Alice in Wonderland essays') ;
    

    ... create a view that combines values from both tables

    CREATE VIEW search_items AS
    SELECT 
        text 'users' AS origin_table, user_id AS id, to_tsvector('english', username) AS searchable_element
    FROM
        users
    UNION ALL
    SELECT 
        text 'topics' AS origin_table, topic_id AS id, to_tsvector('english', topic) AS searchable_element 
    FROM
        topics ;
    

    We search that view:

    SELECT 
        *
    FROM
        search_items
    WHERE
        plainto_tsquery('english', 'alice') @@ searchable_element
    

    ... and get the following response (you should mostly ignore the searchable_element). You're mostly interested in the origin_table and id.

    origin_table | id | searchable_element               
    :----------- | -: | :--------------------------------
    users        |  1 | 'alic':1 'cooper':2              
    topics       |  3 | 'alic':1 'essay':4 'wonderland':3
    

    See Parsing Queries for an explanation of plainto_tsquery function, and also @@ operator.


    To make sure indexes are used:

    EXPLAIN ANALYZE
    SELECT 
        *
    FROM
        search_items
    WHERE
        plainto_tsquery('english', 'alice') @@ searchable_element
    
    | QUERY PLAN                                                                                                                                 |
    | :----------------------------------------------------------------------------------------------------------------------------------------- |
    | Append  (cost=12.05..49.04 rows=12 width=68) (actual time=0.017..0.031 rows=2 loops=1)                                                     |
    |   ->  Bitmap Heap Scan on users  (cost=12.05..24.52 rows=6 width=68) (actual time=0.017..0.018 rows=1 loops=1)                             |
    |         Recheck Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, username))                                                 |
    |         Heap Blocks: exact=1                                                                                                               |
    |         ->  Bitmap Index Scan on idx_users_username_full_text  (cost=0.00..12.05 rows=6 width=0) (actual time=0.005..0.005 rows=1 loops=1) |
    |               Index Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, username))                                             |
    |   ->  Bitmap Heap Scan on topics  (cost=12.05..24.52 rows=6 width=68) (actual time=0.012..0.012 rows=1 loops=1)                            |
    |         Recheck Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, topic))                                                    |
    |         Heap Blocks: exact=1                                                                                                               |
    |         ->  Bitmap Index Scan on idx_topics_topic_full_text  (cost=0.00..12.05 rows=6 width=0) (actual time=0.002..0.002 rows=1 loops=1)   |
    |               Index Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, topic))                                                |
    | Planning time: 0.098 ms                                                                                                                    |
    | Execution time: 0.055 ms                                                                                                                   |
    

    Indexes are really used (see Bitmap Index Scan on idx_topics_topic_full_text and Bitmap Index Scan on idx_users_username_full_text).

    You can check everything at dbfiddle here


    NOTE: 'english' is the text search configuration chosen to index and query. Choose the proper one for your case. You can create your own if the existing ones don't fill your needs.