Search code examples
stringpostgresqlfuzzy-comparisonfuzzytrigram

multiple columns similarity comparison with trigram similarity operator %


I need to perform fuzzy match filtering (in WHERE clause) in PostgreSQL by using trigram similarity operator %. For comparing a field pair it is simply table1.field1 % table2.field2 and GIN or GIST indexes can be used to dramatically increase performance. However, I need to compare bunch of fields at once and final conclusion should be based on overall similarity. For that I could use concatenated expression like the following:

 (COALESCE (l.field1, ' ')
 || ' ' ||  COALESCE (l.field2,' ')
 || ' ' ||  COALESCE (l.field3,' ')
 || ' ' ||  COALESCE (l.field4,' ')
) % (COALESCE (r.field1, ' ')
 || ' ' ||  COALESCE (r.field2,' ')
 || ' ' ||  COALESCE (r.field3,' ')
 || ' ' ||  COALESCE (r.field4,' '))

This expression, however can't utilize GIN or GIST indexes on individual fields and performance is poor. Of course, an obvious solution is to create new concatenated field in both tables, create GIN indexes on them and then compare concatenated fields.

l.concatenated_field % r.concatenated_field

In this case, GIN or GIST indexes are being utilized. But, for various reasons I am not happy with this solution and I was wondering is there an alternative way to perform multicolumn similarity comparison with trigram operator (%), where individual fields GIN or GIST indexes would be utilized?


Solution

  • I have experimented a bit, and yes, GIN index can be built on such expression. It is also mentioned here: https://niallburkley.com/blog/index-columns-for-like-in-postgres/