i want to do aggregation on a field which has values like
doc1_field: "A" doc2_field: "A, B" doc3_field: "A, B, C"
What mappings / settings I can use so that when I aggregate on this field I should get results like:
key: A count: 3
key: "B" count: 2
key: "C" count: 3
I have tried adding a comma separated analyzer on the above field but it's not giving me the desired results, I still get results like
key: "A" count: 1
key: "A, B, C" count: 1
which is not what I want..
Could you please help in how can I aggregate on this field??
The best way is to split the field at ingest time using an ingest pipeline and a split
processor:
Simulating the following ingest pipeline with the same documents you provided would yield this:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"split": {
"field": "field_name",
"separator": "\\s*[,;]\\s*"
}
}
]
},
"docs": [
{
"_source": {
"field_name": "A"
}
},
{
"_source": {
"field_name": "A, B"
}
},
{
"_source": {
"field_name": "A; B; C"
}
}
]
}
Results:
{
"docs": [
{
"doc": {
"_source": {
"field_name": [
"A"
]
}
}
},
{
"doc": {
"_source": {
"field_name": [
"A",
"B"
]
}
}
},
{
"doc": {
"_source": {
"field_name": [
"A",
"B",
"C"
]
}
}
}
]
}
This means that instead of indexing "A, B, C"
you'd index an array containing three separate values: ["A", "B", "C"]
which you can then easily aggregate on and get the results you want.
If you want to go this way, just create the pipeline:
PUT _ingest/pipeline/splitter
{
"processors": [
{
"split": {
"field": "field_name",
"separator": "\\s*,\\s*"
}
}
]
}
And then index your documents by referencing it:
PUT index/_doc/1?pipeline=splitter
{
"field": "A, B, C"
}