I'm running a query on elasticsearch using function_score where the value of a nested field is used for the calculation (price in this case). Which of the following is a better way to index the data?
{
"name": "John",
"age": 50,
"country": "US",
"subscription": {
"Plan1": {
"price": 100,
"date": "June 5th"
},
"Plan2": {
"price": 50,
"date": "June 6th"
}
}
}
OR
{
"name": "John",
"age": 50,
"country": "US",
"subscription": [
{
"name": "Plan1",
"price": 100,
"date": "June 5th"
},
{
"name": "Plan2"
"price": 50,
"date": "June 6th"
}
]
}
The query would be filtering on the "plan name" and "price", and "price" will be used for score calculation. The number of plans maybe upwards of 20.
Edit 1: Sample query for approach 2
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"range": {
"createdatutc": {
"gte": "2022-11-01T00:00:00.000Z",
"lt": "2023-05-06T00:00:00.000Z",
"format": "strict_date_optional_time"
}
}
},
{
"terms": {
"country": [
"US"
]
}
},
{
"term": {
"subscription.name": {
"value": "Plan1"
}
}
}
]
}
},
"functions": [
{
"filter": {
"query_string": {
"default_field": "name",
"query": "\"john\""
}
},
"script_score": {
"script": {
"lang": "painless",
"source": "for (item in params._source.subscription) {if (item.name == 'Plan1') {return item.price}}"
}
}
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}
It depends on how many plans you have. If it's just two or a handful then the first option is better, otherwise you need to make subscription
a nested object and nested object are less optimal in terms of query performance.
With the first option, filtering on the plan name and price can be done with a single condition on subscription.Plan1.price: 100
, while with the second option, you need two conditions (hence why subscription
needs to be nested
), one on subscription.name: Plan1
and another on subscription.price: 100
UPDATE 1: using option 1
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"range": {
"createdatutc": {
"gte": "2022-11-01T00:00:00.000Z",
"lt": "2023-05-06T00:00:00.000Z",
"format": "strict_date_optional_time"
}
}
},
{
"terms": {
"country": [
"US"
]
}
},
{
"exists": {
"field": "subscription.Plan1.price"
}
}
]
}
},
"functions": [
{
"filter": {
"query_string": {
"default_field": "name",
"query": "\"john\""
}
},
"field_value_factor": {
"field": "subscription.Plan1.price",
"factor": 1.2
}
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}