Search code examples
elasticsearchelasticsearch-analyzers

Extract keywords from fields


I want to write a query to analyze one or more fields ?

i.e. current analyzers require text to function, instead of passing text I want to pass a field value.

If I have a document like this

{
    "desc": "A document description",
    "name": "This name is not original",
    "amount": 3000
}

I would like to return something like the below

{
    "desc": ["document", "description"],
    "name": ["name", "original"],
    "amount": 3000
}

Solution

  • You can use Term Vectors or Multi Term Vectors to achieve what you're looking for:

    https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html

    You'd have to specify the Ids of the fields you want as well as the fields and it will return an array of analyzed tokens for each document you have as well as certain other info which you can easily disable.

    GET /exampleindex/_doc/_mtermvectors
    {
      "ids": [
        "1","2"
      ],
      "parameters": {
        "fields": [
          "*"
        ]
      }
    }
    

    Will return something along the lines of:

    "docs": [
        {
          "_index": "exampleindex",
          "_type": "_doc",
          "_id": "1",
          "_version": 2,
          "found": true,
          "took": 0,
          "term_vectors": {
            "desc": {
              "field_statistics": {
                "sum_doc_freq": 5,
                "doc_count": 2,
                "sum_ttf": 5
              },
              "terms": {
                "amazing": {
                  "term_freq": 1,
                  "tokens": [
                    {
                      "position": 1,
                      "start_offset": 3,
                      "end_offset": 10
                    }
                  ]
                },
                "an": {
                  "term_freq": 1,
                  "tokens": [
                    {
                      "position": 0,
                      "start_offset": 0,
                      "end_offset": 2
                    }
                  ]
                }