I have been reading a lot on mappings in Elasticsearch and here's something interesting that I found
Field names with the same name across types are highly recommended to have
the same type and same mapping characteristics (analysis settings for
example). There is an effort to allow to explicitly "choose" which field to
use by using type prefix (my_type.my_field), but it’s not complete, and there
are places where it will never work (like faceting on the field).
I found the above quote from the documentation here
Now my use case is exactly that .. Here's an example. Suppose that some field
in tenant1 has to have the following mapping (for a given entity user):
{
"tenantId1_user": {
"properties": {
"someField": {
"type": "string",
"index":"analyzed"
}
}
}
}
Now, for the same field in a different tenant (for the same entity type, lets say user), the type has to change like this:
{
"tenantId2_user": {
"properties": {
"someField": {
"type": "int",
"index":"analyzed"
}
}
}
}
Now from what I understand from the above quote, it means that technically even though I can provide this mapping, it is not recommended because deep down Lucene handles them in the same way.
My questions are:
1) How can I handle my usecase ? Should I just separate out each tenant in a different index so I don't have to worry about this mapping ?
2) Is there any other workaround ? Considering the fact that if I have too many tenants that means I will have too many indices?
3) What's the recommended way for this usecase?
All help appreciated!
In your scenario, you should use an index per tenant.
AFAIK, there's not limit on number of indices in a cluster - only a "natural" limit based on the available physical resources.
Moreover, having unique indices per tenant will give each tenant less "astonishing" search results. If they are in the same index, the TF-IDF scoring will be biased by the frequency of occurrence of the search term in the documents of all the other tenants.
Side notes (based on additional questions asked in IRC): any node that receives an indexing request or a search request has the cluster metadata specifying which nodes have shards for which indices, hence, only forward the request to appropriate nodes. Also, don't be concerned about having shards of every index on every node. In and if itself, that model doesn't contribute anything useful to your deployment.