Search code examples
pythondjangopostgresqlindexingsimilarity

Creating a Gin Index with Trigram (gin_trgm_ops) in Django model


The new TrigramSimilarity feature of the django.contrib.postgres was great for a problem I had. I use it for a search bar to find hard to spell latin names. The problem is that there are over 2 million names, and the search takes longer then I want.

I'd like to create a index on the trigrams as descibed in the postgres documentation.

But I am not sure how to do this in a way that the Django API would make use of it. For the postgres text search there is a description on how to create an index, but not for the trigram similarity.

This is what I have right now:

class NCBI_names(models.Model):
    tax_id          =   models.ForeignKey(NCBI_nodes, on_delete=models.CASCADE, default = 0)
    name_txt        =   models.CharField(max_length=255, default = '')
    name_class      =   models.CharField(max_length=32, db_index=True, default = '')

    class Meta:
        indexes = [GinIndex(fields=['name_txt'])]

In the view's get_queryset method:

class TaxonSearchListView(ListView):    
    #form_class=TaxonSearchForm
    template_name='collectie/taxon_list.html'
    paginate_by=20
    model=NCBI_names
    context_object_name = 'taxon_list'

    def dispatch(self, request, *args, **kwargs):
        query = request.GET.get('q')
        if query:
            try:
                tax_id = self.model.objects.get(name_txt__iexact=query).tax_id.tax_id
                return redirect('collectie:taxon_detail', tax_id)
            except (self.model.DoesNotExist, self.model.MultipleObjectsReturned) as e:
                return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs)
        else:
            return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs)
    
    def get_queryset(self):
        result = super(TaxonSearchListView, self).get_queryset()
        #
        query = self.request.GET.get('q')
        if query:            
            result = result.exclude(name_txt__icontains = 'sp.')
            result = result.annotate(similarity=TrigramSimilarity('name_txt', query)).filter(similarity__gt=0.3).order_by('-similarity')
        return result

Solution

  • I found a 12/2020 article that uses the newest version of Django ORM as such:

    class Author(models.Model):
        first_name = models.CharField(max_length=100)
        last_name = models.CharField(max_length=100)
    
        class Meta:
            indexes = [
                GinIndex(
                    name='review_author_ln_gin_idx', 
                    fields=['last_name'], 
                    opclasses=['gin_trgm_ops'],
                )
            ]
    

    If, like the original poster, you were looking to create an index that works with icontains, you'll have to index the UPPER() of the column, which requires special handling from OpClass:

    from django.db.models.functions import Upper
    from django.contrib.postgres.indexes import GinIndex, OpClass
    
    class Author(models.Model):
            indexes = [
                GinIndex(
                    OpClass(Upper('last_name'), name='gin_trgm_ops'),
                    name='review_author_ln_gin_idx',
                )
            ]
    

    To use it, you need to add 'django.contrib.postgres' in your INSTALLED_APPS.


    Inspired from an old article on this subject, I landed to a current one which gives the following solution for a GistIndex:

    Update: From Django-1.11 things seem to be simpler, as this answer and django docs sugest:

    from django.contrib.postgres.indexes import GinIndex
    
    class MyModel(models.Model):
        the_field = models.CharField(max_length=512, db_index=True)
    
        class Meta:
            indexes = [GinIndex(fields=['the_field'])]
    

    From Django-2.2, an attribute opclasses will be available in class Index(fields=(), name=None, db_tablespace=None, opclasses=()) for this purpose.


    from django.contrib.postgres.indexes import GistIndex
    
    class GistIndexTrgrmOps(GistIndex):
        def create_sql(self, model, schema_editor):
            # - this Statement is instantiated by the _create_index_sql()
            #   method of django.db.backends.base.schema.BaseDatabaseSchemaEditor.
            #   using sql_create_index template from
            #   django.db.backends.postgresql.schema.DatabaseSchemaEditor
            # - the template has original value:
            #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s)%(extra)s"
            statement = super().create_sql(model, schema_editor)
            # - however, we want to use a GIST index to accelerate trigram
            #   matching, so we want to add the gist_trgm_ops index operator
            #   class
            # - so we replace the template with:
            #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgrm_ops)%(extra)s"
            statement.template =\
                "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgm_ops)%(extra)s"
    
            return statement
    

    Which you can then use in your model class like this:

    class YourModel(models.Model):
        some_field = models.TextField(...)
    
        class Meta:
            indexes = [
                GistIndexTrgrmOps(fields=['some_field'])
            ]