I am working on indexing a large data set which has multiple name fields for a particular entity. I have defined the name field of type array and I am adding around 4 names in that. Some of the names have spaces in between and they are getting tokenized. Can I avoid that?
I know for String we have text as well as keyword type in Elastic but how do I define the type as keyword when I am having array as my data type? By default all the array fields are taken as text type. I want them to be treated as keyword type so they don't get tokenized while indexing.
Expected : If I store "Hello World" in an array, I should be able to search "Hello World".
Current behavior : It stores hello differently and world differently as it tokenizes that.
There is no data type for array in elastic search. Whenever you send an array as value of a property of type x then that property becomes an array accepting only the values of type x.
So for example you created a property as below:
{
"tagIds": {
"type": "integer"
}
}
And you index a document with values as below:
{
"tagIds": [124, 452, 234]
}
Then tagIds
automatically become an array of integers.
For your case all you need to do is create a field say name
with type as keyword
. And make sure you always pass an array to this field even if it has to hold a single value to make sure it is always an array. Below is what you need:
Mapping:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
Indexing document:
PUT test/_doc/1
{
"name" : ["name one"]
}