Search code examples
postgresqlfull-text-search

Can I configure PostgreSQL programmatically to not eliminate stop words in full-text search?


I'm using PostgreSQL full text search for a project where traditional stop words ('a', 'the', 'if' etc.) should be indexed and searchable, which is not the default behaviour. For example, I might want my users to find results for the query 'to be or not to be'.

The documentation indicates that I could achieve this by creating an empty stopwords dictionary in $SHAREDIR/tsearch_data/english.stop (for example), but this will complicate deployment; I want to be able to configure PostgreSQL's stop word handling with SQL. Is this possible? If so, can you provide a sample SQL statement?


Solution

  • As per your comment on the previous answer, you can easily switch between using no stop words and all stop words. You can acheive this with a custom search configuration:

    (1) You can create a custom dictionary without using the stop words file, for example:

    CREATE TEXT SEARCH DICTIONARY english_stem_nostop (
        Template = snowball
        , Language = english
    );
    

    Note, in the above I left out the StopWords parameter.

    (2) Then create a new configuration to use your new dictionary:

    CREATE TEXT SEARCH CONFIGURATION public.english_nostop ( COPY = pg_catalog.english );
    ALTER TEXT SEARCH CONFIGURATION public.english_nostop
       ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, hword, hword_part, word WITH english_stem_nostop;
    

    (3) Test new dictionary

    select to_tsvector('english_nostop','he her the a');
    

    Note that the response contains stop words. With ordinary dictionary response would be empty because second parameter contains only stop words.

    (4) Then, when searching specify the config you want use (alternatively you can change the default_text_search_config option each time), eg:

    SELECT
        title
    FROM
        articles
    WHERE
        to_tsvector('english_nostop', COALESCE(title,'') || ' ' || COALESCE(body,''))
        @@ to_tsquery('english_nostop', 'how & to');
    

    You can specify just 'english' in the above SQL to use the normal config.

    Note, in this example that using the standard configuration will result in notices because there are only stop words.


    Bear in mind the following, however:

    • If you are using indexes, you will need two - one for each configuration. (see these docs: tsearch tables and triggers).
    • Double check which parser tokens you want to use this mapping as per step #2, above (see Parsers).