Search code examples
azureazure-data-factoryazure-table-storage

ADF Copy activity skips Table Storage rows which have different values than the inferred schema


I want to copy a table from one storage account to another with the Copy activity in ADF. I just need a simple copy of my table, preferably 100% identical to the source table.

There is a field in the source table called ExpectedArrivalDate which can change its datatype between rows/entities. On some entities the field is DateTime, on other entities it is String.

  1. DateTime
    DateTime
  2. String String

The reason for this is that the entities are manipulated from C# code where the datatype of this field is DateTime? (i.e. nullable DateTime).

When I execute the Copy activity with no explicit mapping defined, ADF decides automatically that the datatype of the ExpectedArrivalDate column is DateTimeOffset and this causes all the entities where ExpectedArrivalDate is of type String (example value: ‘DateTime.Null’) to be skipped during the copy operation. Based on the error log these entities are skipped because in the source table, instead of a valid DateTime the string value ‘DateTime.Null’ is found and ADF considers this to be incompatible with the schema.

Example of error log generated:

Level OperationName Message
Warning TabularRowSkip Column 'ExpectedArrivalDate' contains an invalid value 'DateTime.Null'.

Trying to add explicit datatype mapping

If I try to define explicit mappings in ADF by using the Import schemas button, I see the following mapping generated: Mapping generated by Import schemas

The data type cannot be changed from the UI, so I go to the JSON view of the pipeline and change it there manually to String for both source and sink.

Edited mapping in JSON

Edited mapping in UI

Running the pipeline again, the same rows are still skipped and my changed datatype mapping is ignored.

Workaround that I found

  1. Go to Table Storage and edit the first entity in the source table, changing the datatype of the ExpectedArrivalDate field from DateTime to String.
  2. Delete all explicit mappings on the Copy activity in ADF.
  3. Run the pipeline.

Result: All rows get copied to the sink, the datatype of ExpectedArrivalDate column is converted to String on all entities, even if it was DateTime on some of them in the source table.

Question

Why cannot the ADF Copy activity handle rows where the schema is not the same as the one inferred from the first entity of the table? Is there a way to make this work without manually changing data in the source table?


Solution

  • I contacted Microsoft Support and they told me my use case scenario is not supported by ADF. As a workaround they suggested to break up my copy activity in two pieces.

    1. Copy from source table into a CSV file in blob storage.
    2. Copy from blob storage into the destination table.

    This way all the data will be copied, no matter if the data type of a column changes between the entities. Unfortunately this results in additional execution time (2 copy activities instead of 1) and as a side effect all the data in all the columns gets converted to String.