Search code examples
azureazure-cosmosdbazure-data-factoryazure-table-storagebackup-strategies

How to archive old CosmosDB data to Azure Table using Azure Data Factory when CosmosDB collection documents have different properties?


I'm trying to archive old data from CosmosDB into Azure Tables but I'm very new to Azure Data Factory and I'm not sure what would be a good approach to do this. At first, I thought that this could be done with a Copy Activity but because the properties from my documents stored in the CosmosDB source vary, I'm getting mapping issues. Any idea on what would be a good approach to tackle this archiving process?

Basically, the way I want to store the data is to copy the document root properties as they are, and store the nested JSON as a serialized string.

For example, if I wanted to archive these 2 documents :

[
  {
    "identifier": "1st Guid here",
    "Contact": {
      "Name":  "John Doe",
      "Age": 99
    }
  },
  { 
    "identifier": "2nd Guid here",
    "Distributor": {
       "Name": "Jane Doe",
       "Phone": {
         "Number": "12345",
         "IsVerified": true
       }
    }
  }
]

I'd like these documents to be stored in Azure Table like this:

identifier      | Contact                                   | Distributor 
"Ist Guid here" | "{ \"Name\": \"John Doe\", \"Age\": 99 }" | null
"2nd Guid here" |  null                                     | "{\"Name\":\"Jane Doe\",\"Phone\":{\"Number\":\"12345\",\"IsVerified\":true}}"

Is this possible with the Copy Activity?

I tried using the mapping tab inside the CopyActivity, but when I try to run it I get an error saying that the dataType for one of the Nested JSON columns that are not present in the first row cannot be inferred.


Solution

  • Please follow my configuration in Mapping Tag.

    enter image description here

    Test output with your sample data:

    enter image description here