Search code examples
azureazure-cognitive-search

Azure search - Using Microsoft English Analyzer increases size of Index


Earlier my index was using lucene analyzer. I changed it to Microsoft. Now the size of index has largely increased. Why does the size increase so much . ? P.S. the attachment. enter image description here


Solution

  • Difference in index size is expected. For each word in your documents a Microsoft analyzer produces the original word and the base form of that word, for example, if your document has the word running, Azure Search will index two terms: running and run. See my answer in the following post for more details: Azure Search: Searching for singular version of a word, but still include plural version in results

    Lucene analyzers stem words what results in fewer unique terms in the index. You can learn more about the differences here: https://learn.microsoft.com/en-us/rest/api/searchservice/Language-support?redirectedfrom=MSDN

    Depending on the analyzer/language the impact on the index size will be different. You can test the behavior of the analyzer you are using with the Analyze API: https://learn.microsoft.com/en-us/rest/api/searchservice/test-analyzer.

    That being said, the difference you are seeing is more than I would expect. Please reach out to me at janusz.lembicz at microsoft to discuss the details of your scenario.