Search code examples
spring-bootelasticsearchspring-dataspring-data-elasticsearch

Elasticsearch analizer working when created throw Springdata but failing when creating straight from Postman/curl


Goal: create Elasticsearch index aimed to be loaded with 10 million simple documents. Each document is basically "Elastisearch id", "some company id" and "name". Provide search-as-suer-type feature.

I could created successfully an index and an analyzer either straight from Postman (curl or any other tool not relying on Spring Data) or during Spring boot initialization. Nevertheless, when I try to use the analizer it seems it is ignored for the one created straight from Postman.

So my main question is: is Springdata adding some setting I am missing when I try straight from posting the json stting? A secondary question is: is there someway to enable Springdata to print the commands auto-generated and executed (kind of similar approach with Hibernate whihc allows you to see the commands printed)? If so, I can visually debug and check what is different.

This is the way creting Index and Analyzer from Springboot/Spring-Data.

main method to boot

@EnableElasticsearchRepositories
@SpringBootApplication
public class SearchApplication {

    public static void main(String[] args) {
        SpringApplication.run(SearchApplication.class, args);
    }

}

my model

@Document(indexName = "correntistas")
@Setting(settingPath = "data/es-config/elastic-setting.json")
@Getter
@Setter
public class Correntista {
    @Id
    private String id;
    private String conta;
    private String sobrenome;

    @Field(type = FieldType.Text, analyzer = "autocomplete_index", searchAnalyzer = "autocomplete_search")
    private String nome;
}

src/main/resources/data/es-config/elastic-setting.json *** NOTE THIS IS EXACTLY THE SAME SETTING I AM POSTING FROM POSTMAN

{
  "analysis": {
    "filter": {
      "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 1,
        "max_gram": 20
      }
    },
    "analyzer": {
      "autocomplete_search": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase"
        ]
      },
      "autocomplete_index": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "autocomplete_filter"
        ]
      }
    }
  }
}

Checking if it was created succesfully I see:

get http://localhost:9200/correntistas/_settings

{
    "correntistas": {
        "settings": {
            "index": {
                "number_of_shards": "5",
                "provided_name": "correntistas",
                "creation_date": "1586615323459",
                "analysis": {
                    "filter": {
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "1",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete_index": {
                            "filter": [
                                "lowercase",
                                "autocomplete_filter"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "autocomplete_search": {
                            "filter": [
                                "lowercase"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "xtN-NOX3RQWJjeRdyC8CVA",
                "version": {
                    "created": "6080499"
                }
            }
        }
    }
}

So far so good.

Now I delete the index with curl -XDELETE localhost:9200/correntistas and I will do the same idea but creating the index and analyzer at once from Postman:

put http://localhost:9200/correntistas with exact same analyzer posted above:

creting index and analyiser at once

Then if I check I the settings I see exact the same result as it was created above from Spring-Data.

Am I missing some extra step that Spring-Data is giving by free and hiding from eyes?

To sum up, when created from Spring-data I see searching with few letters working but when Icreated from postman it simply retrieve data when I search with whole word.

*** Thanks to so friendly and smart help from Opster Elasticsearch Ninja I can add here an extra trick I had learned when posting from Postman (somehow some header enabled in my Postman was crashing with "... Root mapping definition has unsupported parameters... mapper_parsing_exception..." while trying the solution answered bellow. I guess it can be usefull to add here for future readers.

postman headers messing up


Solution

  • As you have not provided your search query which you are using in postman, also the mapping, which would help us to debug, if you are not using the right analyzer on the fields, you are using in your search query. Also adding sample documents and your actual and expected search results always help.

    Nvm, I added your mapping and showing below, how that using postman as well, you will get the correct results.

    Index def exactly same as yours

    {
        "settings": {
            "analysis": {
                "filter": {
                    "autocomplete_filter": {
                        "type": "edge_ngram",
                        "min_gram": 1,
                        "max_gram": 20
                    }
                },
                "analyzer": {
                    "autocomplete_search": {
                        "type": "custom",
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase"
                        ]
                    },
                    "autocomplete_index": {
                        "type": "custom",
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "autocomplete_filter"
                        ]
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "name": {
                    "type": "text",
                    "analyzer": "autocomplete_index",
                    "search_analyzer": "autocomplete_search"
                }
            }
        }
    }
    

    Index sample docs

    {
        "name" : "opster"
    }
    
    {
        "name" : "jim c"
    }
    
    {
        "name" : "jimc"
    }
    
    {
        "name" : "foo"
    }
    

    Searching for partial words like ji brings both jim c and jimc docs

    {
        "query": {
            "match": {
                "name": {
                    "query": "ji"
                }
            }
        }
    }
    

    Result

      "hits": [
                {
                    "_index": "61158504",
                    "_type": "_doc",
                    "_id": "2",
                    "_score": 0.69263697,
                    "_source": {
                        "name": "jimc"
                    }
                },
                {
                    "_index": "61158504",
                    "_type": "_doc",
                    "_id": "1",
                    "_score": 0.6133945,
                    "_source": {
                        "name": "jim c"
                    }
                }
            ]