Search code examples
sqlpostgresqltrigrampg-trgm

Real number comparison for trigram similarity


I am implementing trigram similarity for word matching in column comum1. similarity() returns real. I have converted 0.01 to real and rounded to 2 decimal digits. Though there are rank values greater than 0.01, I get no results on screen. If I remove the WHERE condition, lots of results are available. Kindly guide me how to overcome this issue.

SELECT *,ROUND(similarity(comum1,"Search_word"),2) AS rank
FROM schema.table
WHERE rank >= round(0.01::real,2)

I have also converted both numbers to numeric and compared, but that also didn't work:

SELECT *,ROUND(similarity(comum1,"Search_word")::NUMERIC,2) AS rank
FROM schema.table
WHERE rank >= round(0.01::NUMERIC,2)
LIMIT 50;

Solution

  • The WHERE clause can only reference input column names, coming from the underlying table(s). rank in your example is the column alias for a result - an output column name.

    So your statement is illegal and should return with an error message - unless you have another column named rank in schema.table, in which case you shot yourself in the foot. I would think twice before introducing such a naming collision, while I am not completely firm with SQL syntax.

    And round() with a second parameter is not defined for real, you would need to cast to numeric like you tried. Another reason your first query is illegal.

    Also, the double-quotes around "Search_word" are highly suspicious. If that's supposed to be a string literal, you need single quotes: 'Search_word'.

    This should work:

    SELECT *, round(similarity(comum1,'Search_word')::numeric,2) AS rank
    FROM   schema.table
    WHERE  similarity(comum1, 'Search_word') > 0.01;
    

    But it's still pretty useless as it fails to make use of trigram indexes. Do this instead:

    SET pg_trgm.similarity_threshold = 0.01;  -- set once
    
    SELECT *
    FROM   schema.table
    WHERE  comum1 % 'Search_word';
    

    See:

    That said, a similarity of 0.01 is almost no similarity. Typically, you need a much higher threshold.