Search code examples
azureazure-cognitive-search

Where indices are stored behind Azure Cognitive Search service?


Please check this tweet chain. I am working on a PoC using Azure Cognitive Search Service and I am comparing it with AWS. AWS seems to be using MongoDB Atlas to store the indices and Search function is basically is on Mongo's default search capability which is built on Apache Lucene. I am trying to find how the inverted indices are stored behind the scenes of Azure Cognitive Search. They are using Apache Lucene which serves as the search engine to search the index.


Solution

  • Disclaimer

    This answer should be considered accurate only as of July 2020, because implementation details do change. This information isn't material to which service is "better" for any particular purpose; just interesting for the sake of curiosity.

    Also, do not take my answer to be any kind of API contract or promise of future functionality or performance. We encapsulate the storage details so that you don't have to worry about them, and also so that we have the freedom to change them if needed.

    Answer

    Azure Cognitive Search uses Apache Lucene under the hood, which manages the inverted indexes. As of the time of this writing, those indexes are stored on Azure virtual machine disks, which are backed by page blobs. The exact SKU of disks used depends on pricing tier and other factors; I won't get into the details here (because they do change). Those disks are attached to Azure virtual machines, which for pricing tiers other than Free map to the "search units" you pay for.