Search code examples
elasticsearchelasticsearch-2.0

Elasticsearch - use a "tags" index to discover all tags in a given string


I have an elasticsearch v2.x cluster with a "tags" index that contains about 5000 tags: {tagName, tagID}. Given a string, is it possible to query the tags index to get all tags that are found in that string? Not only do I want exact matches, but I also want to be able to control for fuzzy matches without being too generous. By too generous, a tag should only match if all tokens in the tag are found within a certain proximity of each other (say 5 words).

For example, given the string:

Model 22340 Sound Spectrum Analyzer

The following tags should match:

sound analyzer sound spectrum analyzer

BUT NOT

sound meter light spectrum chemical analyzer


Solution

  • I don't think it's possible to create an accurate elasticsearch query that will auto-tag a random string. That's basically a reverse query. The most accurate way to match a tag to a document is to construct a query for the tag, and then search the document. Obviously this would be terribly inefficient if you need to iterate over each tag to auto-tag a document.

    To do a reverse query, you want to use the Elasticsearch Percolator API:

    https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

    The API is very flexible and allows you to create fairly complex queries into documents with multiple fields.

    The basic concept is this (assuming your tags have an app specific ID field):

    1. For each tag, create a query for it, and register the query with the percolator (using the tag's ID field).

    2. To auto-tag a string, pass your string (as a document) to the Percolator, which will match it against all registered queries.

    3. Iterate over the matches. Each match includes the _id of the query. Use the _id to reference the tag.

    This is also a good article to read: https://www.elastic.co/blog/percolator-redesign-blog-post