I have a field carName
which uses some analyzer:
@Field(type = FieldType.Text, searchAnalyzer = "myAnalyzer", analyzer = "myAnalyzer")
private String carName;
The myAnalyzer
analyzer looks like this:
{
"index": {
"analysis": {
"filter": {
"myStopwords": {
"ignore_case": "true",
"type": "stop",
"stopwords": [
"word1",
"word2"
]
}
},
"char_filter": {
"myTrimmer": {
"flags": "CASE_INSENSITIVE",
"pattern": "somepatter",
"replacement": "somrereplacement",
"type": "pattern_replace"
}
},
"analyzer": {
"myAnalyzer": {
"filter": [
"lowercase",
"unique",
"myStopwords"
],
"char_filter": [
"myTrimmer"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
Now myStopwords
will expand or can shrink. In my database I have CAR
entity and once someone is adding new car, it is indexed in ES as document. What do I have to do when someone changes stopwords list? Is it possible to refresh data only on ElasticSearch side, without even reading them from my database? Or due to changes in stopword list some data in index where carName
is located could be lost during indexing - the words that were part of stopwords list for example? And in that case, unfortunatelly, I need to read cars from database again and index them again.. ?
As I understand the analyzer
and in my case myAnalyzer
is used during indexing process by ES, then at first sight it seems that if I change stopwords list (so it this case it is analyzer
change), then I should reindex my cars right but maybe I am wrong ? If a car was named 'Ford King Taurus' and the King
was not in the stopwords list, then what happens if I add King
to stopword list.. And if `King' was in stopwords list and some documents were indexed and now is removed from that list, what happens with search then.. Would searching work fine after such mapping changes ?
I read about UpdateByQuery method that I think could be used for some similar cases to update for example part of the document. But could it be used here ? I mean.. how could I tell Elasticsearch, if it is neccessary, to refresh all carNames due to stopwords list change ?
If you're using the same analyzer and index time and search time and you update your stop words list, both index-time and search-time analyzers will use the new stop words list right away, however, anything that is already indexed will not be updated, you'll need to _update_by_query
your index in order for the new stop words to be applied.
A quick example:
If you index Ford King Taurus
and the stop words list doesn't contain King
, then the following tokens will be indexed: Ford
, King
and Taurus
. At search time, you can find the document using either of these three terms.
Then you add King
in the stop words list, close and reopen your index in order to refresh your analyzers. At this point, the former document with Ford King Taurus
will not be searchable with King
anymore since the search analyzer now ignores King
even though the token King
is still indexed. You could still find the document using the standard
search analyzer and searching for king
though, since the king
token is still indexed.
However, if you index a new document, say, Seat King
, then only Seat
will be indexed and searching for King
will yield nothing.
If you want your former document to pick up the new stop word King
you need to either reindex the document or simply update your index in place using _update_by_query
so the source documents get reindexed upon themselves, but with the index-time analyzer that has the new stop words list including King
Here is a quick summary of all the above explanations:
# 1. You create your index like normal
PUT test2
{
"settings": {...},
"mappings": {...}
}
# 2. You index "Ford King Taurus"
POST test2/_doc/1
{
"carName": "Ford King Taurus"
}
# 3. You can find it searching for "king"
POST test2/_search
{
"query": {
"match": {
"carName": "king"
}
}
}
# 4. You close the index, add "king" a new stop words and reopen the index
POST test2/_close
PUT test2/_settings
{
"index": {
"analysis": {
"filter": {
"myStopwords": {
"ignore_case": "true",
"type": "stop",
"stopwords": [
"word1",
"word2",
"king"
]
}
},
"analyzer": {
"myAnalyzer": {
"filter": [
"lowercase",
"unique",
"myStopwords"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
POST test2/_open
# 5. You cannot find the document searching for "king"
POST test2/_search
{
"query": {
"match": {
"carName": {
"query": "king"
}
}
}
}
=> No results
# 6. But you can still find it using the standard search analyzer
POST test2/_search
{
"query": {
"match": {
"carName": {
"query": "king",
"analyzer": "standard"
}
}
}
}
=> 1 result
# 7. You update your index in place
POST test2/_update_by_query
# 8. None of the search queries will find anything with "king"