Search code examples
postgresqlazureazure-data-factorynpgsqlazure-postgresql

Azure Data factory Copy Pipeline failing if more than 6 columns


I'm trying to copy data from a view in either on-premise PostgeSQL or MSSQL to a table in Azure PostgreSQL. I can't seem to get this to successfully copy when I map more than 6 columns from source to sink.

I suspected that one of the source columns was the issue, so varied what columns were mapped but all columns will succeed if less than 6 are copied in total. I then tried different source and sink tables, which produces the same results.

If I copy to an intermediary csv file I can import/export unlimited columns successfully.

Error output shown in the ADF console consistently is: "Operation on target Copy_7dp failed: Type=Npgsql.PostgresException,Message=08P01: invalid message format,Source=Npgsql,'"


Solution

  • Error output shown in the ADF console consistently is: "Operation on target Copy_7dp failed: Type=Npgsql.PostgresException,Message=08P01: invalid message format,Source=Npgsql,'"

    You may find some clues in this old case which is similar to your error trace. Please see the solution in above link: https://github.com/npgsql/npgsql/issues/189#issuecomment-67001166

    Basically, if the string value we try to persist using Npgsql was derived from MemoryStream.GetBuffer() it will explode, but if it came from MemoryStream.ToArray() it will work fine.

    It is explained in official document:

    Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

    However,i suspect that this is the defect of ADF Postgresql connector and we could not adjust any transfer source code at the use level.(Surely,you could submit feedback to azure adf team to get official statement)

    For now,as workaround, you could use csv file to be an intermediary. Transfer data from on-premise database to csv files in Azure Blob Storage. Then transfer data into destination postgresql database.