Search code examples
azure-data-factoryparquet

org.apache.parquet.schema.InvalidSchemaException:Cannot write a schema with an empty group


I am extracting responses from an API in ADF, I used COPY activity source as REST API, and sink as ADLS Gen2 Parquet. When I debug the copy activity it fails for below reason.

enter image description here

Here is the response JSON(sample) that I get from API upon request. Response JSON in the format of

{
    "Regions": [
        {
            "RCode": "a",
            "Name": "b",
            "DbCode": "c",
            "DisplayOrder": 1,
            "HasAccess": false
        },
        {
            "RCode": "e",
            "Name": "f",
            "DbCode": "g",
            "DisplayOrder": 2,
            "HasAccess": false
        },
}

But the WEB activity(without saving to ADLS) is completed successfully. Can Someone help me understand what the issue is?


Solution

  • You need to give the mapping setting manually in the copy activity. Otherwise, you will get that error.

    • Got the same error when mapping is not given.

    ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: org.apache.parquet.schema.InvalidSchemaException:Cannot write a schema with an empty group: message adms_schema

    enter image description here

    • In order to overcome this error, click the mapping tab in copy activity. Click Import schemas. Select Advance editors and click the Region array in collection reference and map the input column with output.

    enter image description here

    • When the pipeline is run, It is executed without any error.
    • Similarly for delimited(csv) sink, when mapping is given, all records are copied. Otherwise, zero rows are copied to csv sink.

    Reference: Schema and data type mapping in copy activity - Azure Data Factory & Azure Synapse | Microsoft Learn