My question is about performance. I am using filtered query a lot and I am not certain what is the proper way to query by type.
So first, lets have a look at the mappings:
{
"my_index": {
"mappings": {
"type_Light_Yellow": {
"properties": {
"color_type": {
"properties": {
"color": {
"type": "string",
"index": "not_analyzed"
},
"brightness": {
"type": "string",
"index": "not_analyzed"
}
}
},
"details": {
"properties": {
"FirstName": {
"type": "string",
"index": "not_analyzed"
},
"LastName": {
"type": "string",
"index": "not_analyzed"
},
.
.
.
}
}
}
}
}
}
}
Above, we can see example of one mapping for type light Yellow. As well, there are many more mappings for various types (colors. e.g: dark Yellow, light Brown and so on...)
Please notice color_type
's sub fields.
For type type_Light_Yellow
, values are always: "color": "Yellow", "brightness" : "Light"
and so on for all other types.
And now, my performance question: I wonder if there is a favorite method for querying my index.
For example, let's search for all documents where "details.FirstName": "John"
and "details.LastName": "Doe"
under type type_Light_Yellow
.
Current method I'm using:
curl -XPOST 'http://somedomain.com:1234my_index/_search' -d '{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"color_type.color": "Yellow"
}
},
{
"term":{
"color_type.brightness": "Light"
}
},
{
"term":{
"details.FirstName": "John"
}
},
{
"term":{
"details.LastName": "Doe"
}
}
]
}
}
}
}
}'
As can be seen above, by defining
"color_type.color": "Yellow"
and "color_type.brightness": "Light"
, I am querying all the index and referring type type_Light_Yellow
as it was just another field under the documents I'm searching.
The alternate method is to query directly under the type:
curl -XPOST 'http://somedomain.com:1234my_index/type_Light_Yellow/_search' -d '{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"details.FirstName": "John"
}
},
{
"term": {
"details.LastName": "Doe"
}
}
]
}
}
}
}
}'
Please notice the first line: my_index/type_Light_Yellow/_search
.
Types in elasticsearch work by adding _type attribute to documents and every time you search a specific type it automatically filters by _type attributes. So, performance wise there shouldn't be much of a difference. Types are an abstraction and not actual data. What I mean here is that, fields across multiple document types are flattened out on entire index, i.e. fields of one type occupy space on fields of other type as well, even though they are not indexed (think of it the same way as null occupies space).
But its important to keep in mind that order of filtering impacts performance.You must aim to exclude as many documents as possible in one go. So, if you think its better not to first filter by type, filtering the way first way is preferable. Otherwise, I don't think there would be much of a difference if ordering is same.
Since Python API also queries over http in default settings, use of Python shouldn't impact performance.
Here, in your case is certain degree of data duplication though as color is captured both in _type meta field as well as color field.