Search code examples
phpelasticsearchelasticsearch-6

Elasticseach or query for comma separated values


I am saving id's in the database as comma separated and indexing the same to ElasticSearch. Now I need to retrieve if the user_id matches with the value.

For example it it saving like this in the indexing for the column user_ids (database type is varchar(500) in elasticsearch it is text)

8938,8936,8937

$userId = 8936; // For example expecting to return that row
$whereCondition = [];
$whereCondition[]  = [
                "query_string" => [
                    "query"=> $userId,
                    "default_field" => "user_ids",
                    "default_operator" => "OR"
                ]
            ];

$searchParams = [
    'query' => [
        'bool' => [
            'must' => [
                $whereCondition
            ],
            'must_not' => [
                ['exists' => ['field' => 'deleted_at']]
            ]
        ]
    ],
    "size" => 10000
];

User::search($searchParams);

Json Query

{
    "query": {
        "bool": {
            "must": [
                [{
                    "query_string": {
                        "query": 8936,
                        "default_field": "user_ids",
                        "default_operator": "OR"
                    }
                }]
            ],
            "must_not": [
                [{
                    "exists": {
                        "field": "deleted_at"
                    }
                }]
            ]
        }
    },
    "size": 10000
}

Mapping details

{
    "user_details_index": {
        "aliases": {},
        "mappings": {
            "test_type": {
                "properties": {
                    "created_at": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss"
                    },
                    "deleted_at": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss"
                    },
                    "updated_at": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss"
                    },
                    "user_ids": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1546404165500",
                "number_of_shards": "5",
                "number_of_replicas": "1",
                "uuid": "krpph26NTv2ykt6xE05klQ",
                "version": {
                    "created": "6020299"
                },
                "provided_name": "user_details_index"
            }
        }
    }
}

I am trying with above logic, but not unable to retrieve. Can someone help on this.


Solution

  • Since the field user_ids is of type text any no analyzer is specified for it by default it will use standard analyzer which won't break 8938,8936,8937 into terms 8938, 8936 and 8937 and hence the id can't match.

    To solve this I would suggest you to store array of ids to user_ids field instead of csv. So while indexing you json input should look as below:

    {
       ...
    
       "user_ids": [
          8938,
          8936,
          8937
       ]
    
       ...
    }
    

    Since user ids are integer values following changes should be done in mapping:

    {
       "user_ids": {
          "type": "integer"
       }
    }
    

    The query will be now as follow:

    {
      "query": {
        "bool": {
          "filter": [
            [
              {
                "terms": {
                  "userIds": [
                    8936
                  ]
                }
              }
            ]
          ],
          "must_not": [
            [
              {
                "exists": {
                  "field": "deleted_at"
                }
              }
            ]
          ]
        }
      },
      "size": 10000
    }