Search code examples
pythondjangopostgresqldjango-postgresql

How to do an accent-insensitive TrigramSimilarity search in django?


How can I add accent-insensitive search to following snippet from the django docs:

>>> from django.contrib.postgres.search import TrigramSimilarity
>>> Author.objects.create(name='Katy Stevens')
>>> Author.objects.create(name='Stephen Keats')
>>> test = 'Katie Stephens'
>>> Author.objects.annotate(
...     similarity=TrigramSimilarity('name', test),
... ).filter(similarity__gt=0.3).order_by('-similarity')
[<Author: Katy Stevens>, <Author: Stephen Keats>]

How could this match test = 'Kâtié Stéphèns'?


Solution

  • There exist the unaccent lookup:

    The unaccent lookup allows you to perform accent-insensitive lookups using a dedicated PostgreSQL extension.

    Also if you take a look at the aggregation part of django docs, you can read the following:

    When specifying the field to be aggregated in an aggregate function, Django will allow you to use the same double underscore notation that is used when referring to related fields in filters. Django will then handle any table joins that are required to retrieve and aggregate the related value.


    Derived from the above:

    You can use the trigram_similar lookup, combined with unaccent, then annotate on the result:

    Author.objects.filter(
        name__unaccent__trigram_similar=test
    ).annotate(
        similarity=TrigramSimilarity('name__unaccent', test),
    ).filter(similarity__gt=0.3).order_by('-similarity')
    

    OR

    if you want to keep it as close as possible to the original sample (and omit one potentially slow filtering followed by another):

    Author.objects.annotate(
        similarity=TrigramSimilarity('name__unaccent', test),
    ).filter(similarity__gt=0.3).order_by('-similarity')
    

    Those will only work in Django version >= 1.10


    EDIT:

    Although the above should work, @Private reports this error occurred:

    Cannot resolve keyword 'unaccent' into a field. Join on 'unaccented' not permitted.
    

    This may be a bug, or unaccent is not intended to work that way. The following code works without the error:

    Author.objects.filter(
        name__unaccent__trigram_similar=test
    ).annotate(
        similarity=TrigramSimilarity('name', test),
    ).filter(similarity__gt=0.3).order_by('-similarity')