Search code examples
elasticsearchelasticsearch-dslelasticsearch-query

Elasticsearch: full-text search and filtering by nested array of objects


There is a task to make a GUI table that is built based on data from N-join tables in PostgreSQL. This GUI table implies sorting and filtering with full-text search capability.

I want to use elastic for this purpose. Prepared this data-structure for elasticsearch:

{
  did_user_read: true,
  view_info: {
      total: 1,
      users: [
          { name: 'John Smith', read_at: '2020-02-04 11:00:01', is_current_user: false },
          { name: 'Samuel Jackson', read_at: '2020-02-04 11:00:01', is_current_user: true },
      ],
  },
  is_favorite: true,
  has_attachments: true,
  from: { 
      short_name: 'You',  
      full_name: 'Chuck Norris',
      email: 'ch.norris@example.com', 
      is_current_user: true 
  },
  subject: 'The secret of the appearance of navel lints',
  received_at: '2020-02-04 11:00:01'
}

Please advise how to index this structure correctly so that you can filter and search by nested objects and by nested arrays of objects?

For example, I want to get all the records with these criteria:

is_favorite IS false

AND

FULL_TEXT_SEARCH("sam jackson") 
   BY FIELDS 
    users.name,        -- inside of array(!) 
    from.full_name,
    from.short_name

AND

users.is_current_user IS NOT false

AND

ORDER BY received_at DESC

Solution

  • Your mapping of elasticsearch index for the above data-structure should be :

    Mapping

    {
        "mappings": {
            "properties": {
                "did_user_read": {
                    "type": "boolean"
                },
                "view_info": {
                    "properties": {
                        "total": {
                            "type": "integer"
                        },
                        "users": {
                            "properties": {
                                "name": {
                                    "type": "text"
                                },
                                "read_at": {
                                    "type": "date",
                                    "format": "date_hour_minute_second"
                                },
                                "is_current_user": {
                                    "type": "boolean"
                                }
                            }
                        }
                    }
                },
                "is_favorite": {
                    "type": "boolean"
                },
                "has_attachments": {
                    "type": "boolean"
                },
                "from": {
                    "properties": {
                        "short_name": {
                            "type": "text"
                        },
                        "full_name": {
                            "type": "text"
                        },
                        "email": {
                            "type": "keyword"
                        },
                        "is_current_user": {
                            "type": "boolean"
                        }
                    }
                },
                "subject": {
                    "type": "text"
                },
                "received_at": {
                    "type": "date",
                    "format": "date_hour_minute_second"
                }
            }
        }
    }
    

    Now I have indexed few of the documents in the same format that you have given in your example.

    Search query based on asked criteria should be :

    Search Query:

    {
        "query": {
            "bool": {
                "filter": [
                    {
                        "term": {
                            "is_favorite": false
                        }
                    },
                    {
                        "term": {
                            "view_info.users.is_current_user": true  
                        }
                    }
                ],
                "must": {
                    "multi_match": {
                        "query": "sam jackson",
                        "fields": [
                            "view_info.users.name",
                            "from.full_name",
                            "from.short_name"
                        ]
                    }
                }
    
    
            }
    
        },
        "sort": [
        {
          "received_at": {
            "order": "desc"
          }
        }
      ]
    }
    

    Output

    "hits": [
          {
            "_index": "topics",
            "_type": "_doc",
            "_id": "3",
            "_score": null,
            "_source": {
              "did_user_read": true,
              "view_info": {
                "total": 1,
                "users": [
                  {
                    "name": "John Smith",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": false
                  },
                  {
                    "name": "Samuel Jackson",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": true
                  }
                ]
              },
              "is_favorite": false,
              "has_attachments": true,
              "from": {
                "short_name": "You",
                "full_name": "Chuck Norris",
                "email": "ch.norris@example.com",
                "is_current_user": true
              },
              "subject": "The secret of the appearance of navel lints",
              "received_at": "2020-02-04T11:00:03"
            },
            "sort": [
              1580814003000
            ]
          },
          {
            "_index": "topics",
            "_type": "_doc",
            "_id": "2",
            "_score": null,
            "_source": {
              "did_user_read": true,
              "view_info": {
                "total": 1,
                "users": [
                  {
                    "name": "John Smith",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": false
                  },
                  {
                    "name": "Samuel Jackson",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": true
                  }
                ]
              },
              "is_favorite": false,
              "has_attachments": true,
              "from": {
                "short_name": "You",
                "full_name": "Chuck Norris",
                "email": "ch.norris@example.com",
                "is_current_user": true
              },
              "subject": "The secret of the appearance of navel lints",
              "received_at": "2020-02-04T11:00:01"
            },
            "sort": [
              1580814001000
            ]
          }
        ]
    

    Explaination :

    Based on your query this is how search query is constructed :

    • is_favorite IS false and users.is_current_user IS NOT false

      This is done with the help of filter query . Filter is used when we want our documents to meet some conditions but they do not contribute in calculation of score of searched documents . Now since both the query fields are Boolean they wont contribute to calculation to score since there answer is either yes or no.

    • FULL_TEXT_SEARCH("sam jackson") BY FIELDS users.name, -- inside of array(!) from.full_name, from.short_name

      Here we want to search sam jackson and they should be in all 3 fields so match_phrase is used.

    These three condition are kept in bool filter since there is AND condition that joins them

    • ORDER BY received_at DESC

      For this sort query is used

    NOTE : You have to change your data where datetime is present like at read_at, received_at . Currently you are taking format as 2020-02-04 11:00:01 . You just need to change a little bit so that it takes format 2020-02-04T11:00:01 (instead of space use T) while indexing documents in elasticsearch , since elasticsearch accepts only set of date time formats. You can refer about the formats accepted by date time here https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html