Search code examples
djangopostgresqlfull-text-searchfuzzy-search

Inexact full-text search in PostgreSQL and Django


I'm new to PostgreSQL, and I'm not sure how to go about doing an inexact full-text search. Not that it matters too much, but I'm using Django. In other words, I'm looking for something like the following:

q = 'hello world'
queryset = Entry.objects.extra(
    where=['body_tsv @@ plainto_tsquery(%s)'], 
    params=[q])
for entry in queryset:
    print entry.title

where I the list of entries should contain either exactly 'hello world', or something similar. The listings should then be ordered according to how far away their value is from the specified string. For instance, I would like the query to include entries containing "Hello World", "hEllo world", "helloworld", "hell world", etc., with some sort of ranking indicating how far away each item is from the perfect, unchanged query string.

How would you go about doing this?


Solution

  • Your best bet is to use Django raw querysets, I use it with MySQL to perform full text matching. If the data is all in the database and Postgres provides the matching capability then it makes sense to use it. Plus Postgres offers some really useful things in terms of stemming etc with full text queries.

    Basically it lets you write the actual query you want yet returns models (as long as you are querying a model table obviously).

    The advantage this gives you is that you can test the exact query you will be using first in Postgres, the documentation covers full text queries pretty well.

    The main gotcha with raw querysets at the moment is they don't support count. So if you will be returning lots of data and have memory constraints on your application you might need to do something clever.


    "Inexact" matching however isn't really part of the full text searching capabilities. Instead you want the postgres fuzzystrmatch contrib module. It's use is described here with indexes.