Search code examples
azuresearchindexingazure-cosmosdbazure-cognitive-search

DocumentDB and Azure Search: Document removed from documentDB isn't updated in Azure Search index


When i remove a document from DocumentDB it wont be removed from the Azure Search Index. The index will update if i change something in a document. I'm not quite sure how i should use this "SoftDeleteColumnDeletionDetectionPolicy" in the datasource.

My datasource is as follows:

{
"name": "mydocdbdatasource",
"type": "documentdb",
"credentials": {
    "connectionString": "AccountEndpoint=https://myDocDbEndpoint.documents.azure.com;AccountKey=myDocDbAuthKey;Database=myDocDbDatabaseId"
},
"container": {
    "name": "myDocDbCollectionId",
    "query": "SELECT s.id, s.Title, s.Abstract, s._ts FROM Sessions s WHERE s._ts > @HighWaterMark" 
},
"dataChangeDetectionPolicy": {
    "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
    "highWaterMarkColumnName": "_ts"
},
"dataDeletionDetectionPolicy": {
    "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
    "softDeleteColumnName": "isDeleted",
    "softDeleteMarkerValue": "true"
    }
}

And i have followed this guide: https://azure.microsoft.com/en-us/documentation/articles/documentdb-search-indexer/

What am i doing wrong? Am i missing something?


Solution

  • I will describe what I understand about SoftDeleteColumnDeletionDetectionPolicy in a data source. As the name suggests, it is Soft Delete policy and not the Hard Delete policy. Or in other words, the data is still there in your data source but it is somehow marked as deleted.

    Essentially the way it works is periodically Search Service will query the data source and checks for the entries that are deleted by checking the value of the attribute defined in SoftDeleteColumnDeletionDetectionPolicy. So in your case, it will query the DocumentDB collection and find out the documents for which isDeleted attribute's value is true. It then removes the matching documents from the Index.

    The reason it is not working for you is because you are actually deleting the records instead of changing the value of isDeleted from false to true. Thus it never finds matching values and no changes are done to the index.

    One thing you could possibly do is instead of doing Hard Delete, you do Soft Delete in your DocumentDB collection to begin with. When the Search Service re-indexes your data, because the document is soft deleted from the source it will be removed from the index. Then to save storage costs at the DocumentDB level, you simply delete these documents through a background process some time later.