Search code examples
postgresqln-gramtrigram

How to create n-gram in postgresql


I want search functionality in my application.

I used trigram for achieving it and it is working fine.

Problem is:

trigram is creating sequence of 3 character group of word. I want more than 3 character in single object. For Example:

select show_trgm('abcpqrs');

This will return: {" a"," ab","abc","bcp","cpq","pqr","qrs","rs "}

I want {" a"," ab","abc","bcp","cpq","pqr","qrs","rs ","abcd","bcpq","cpqr"...}

How can I achieve this?


Solution

  • Try this function.

    CREATE OR REPLACE FUNCTION myngram(mystr TEXT, n INT) RETURNS TEXT[]
    AS $$
    DECLARE
    str VARCHAR;
    arr TEXT[];
    BEGIN
    
      str := lpad(mystr, n - 1 + char_length(mystr), ' ');
      str := rpad(str, n - 1 + char_length(str), ' ');
    
      arr := array[]::TEXT[];
      FOR i IN 1 .. char_length(str) - n + 1 LOOP
        arr := arr || substring(str from i for n);
      END LOOP;
      RETURN arr;
    END
    $$
    LANGUAGE plpgsql;
    

    Results from my testing.

    testdb=# SELECT myngram('abcpqrs', 4);
                                 myngram                             
    -----------------------------------------------------------------
     {"   a","  ab"," abc",abcp,bcpq,cpqr,pqrs,"qrs ","rs  ","s   "}
    (1 row)
    
    testdb=# SELECT myngram('abcpqrs', 5);
                                           myngram                                       
    -------------------------------------------------------------------------------------
     {"    a","   ab","  abc"," abcp",abcpq,bcpqr,cpqrs,"pqrs ","qrs  ","rs   ","s    "}
    (1 row)