Search code examples
sql-serverentity-frameworkentity-framework-migrationsbcp

Using bcp.exe to successfully import data, even if the column order in the format file differs to that of the table


I've been creating a system for my team which allows us to automatically generate large amounts of test data so that we can return our database to a consistent state when we need it. This needs to be fully automated, but I currently have one issue that's preventing that from happening.

Our database is created using Entity Framework, and we're also using migrations. The problem is that, as column orders can change because of a migration, it means test data that was exported prior to a migration may no longer import correctly when using bcp.

To illustrate this more clearly, here are the steps to reproduce the behaviour:

Create a simple table:

dbo.Person - original column order

Create the format file:

C:\bcp [BCPTest].[dbo].[Person] format nul -f person.xml -x -N -T -S .\SQLExpress

Create the binary output file:

C:\bcp [BCPTest].[dbo].[Person] out person.dat -N -T -S .\SQLExpress

Reorder the columns and save:

dbo.Person - migration order

Try to import:

C:\bcp [BCPTest].[dbo].[Person] in person.dat -f person.xml -N -T -S .\SQLExpress

This will lead to lots of different errors. This can currently be fixed, by reordering the columns in the tables that failed, manually, saving and rerunning the import, but, as I said earlier, this needs to be fully automated.

Does anyone have any suggestions as to how I can achieve this?


Solution

  • Try removing the -N option from the BCP in command. The -N option effectively overrides the format file field ordinals. The format file specifies the native field formats anyway so the -N specification is redundant.

    EDIT:

    Below is an T-SQL example that changes the column mapping order of the original format file xml to match the current table. This can be extended to handle dropped columns and detections of conditions that must be handled manually, like new not-NULL columns.

    I suggest you invoke this from a Powershell script that iterates over your list of format files and calls the proc passing in the original format XML and table name for each.

    CREATE PROC dbo.usp_GetNewFormatFileXml
          @TableName nvarchar(261)
        , @OriginalFormatFileXml xml
        , @NewFormatFileXml xml OUTPUT
    AS
    
    DECLARE @NewFormatFileColumns xml;
    
    --genrate new field/column mappings based in current column ordinal
    WITH
        XMLNAMESPACES (
              'http://www.w3.org/2001/XMLSchema-instance' AS xsi
            ,  DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/bulkload/format'
            )
    SELECT @NewFormatFileColumns = (
        SELECT
              RecordField.ID AS [@SOURCE]
            , NewTableColumn.Name AS [@NAME]
            , OldTableColumn.type AS [@xsi:type]
        FROM (
            SELECT 
                Field.value('@ID', 'int') AS ID
            FROM @OriginalFormatFileXml.nodes('/BCPFORMAT/RECORD/FIELD') AS Record(Field)) AS RecordField
        JOIN (
            SELECT 
                  TableColumn.value('@SOURCE', 'int') AS FieldId
                , TableColumn.value('@NAME', 'sysname') AS Name
                , TableColumn.value('@xsi:type', 'sysname') AS type
            FROM @OriginalFormatFileXml.nodes('/BCPFORMAT/ROW/COLUMN') AS TableRow(TableColumn)) AS OldTableColumn ON
            OldTableColumn.FieldId = RecordField.ID
        JOIN (
            SELECT
                  Name
                , column_id
            FROM sys.columns
            WHERE 
                object_id = OBJECT_ID(@TableName)
            ) AS NewTableColumn ON
                NewTableColumn.Name = OldTableColumn.Name
        ORDER BY NewTableColumn.column_id
    FOR XML PATH('COLUMN'), ROOT('ROW'), TYPE);
    
    SET @NewFormatFileXml = @OriginalFormatFileXml;
    
    --remove old column mapping
    SET @NewFormatFileXml.modify('
        declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/bulkload/format";
        delete
            (/BCPFORMAT/ROW[1] )[1] 
            ') ;
    
    --add new column mapping
    SET @NewFormatFileXml.modify('
        declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/bulkload/format";
        insert sql:variable("@NewFormatFileColumns")
            into (/BCPFORMAT )[1] 
            ') ;
    GO
    
    --example usage
    DECLARE
        @TableName nvarchar(261) = 'dbo.Person'
        , @OriginalFormatFileXml xml = '<?xml version="1.0"?>
    <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <RECORD>
      <FIELD ID="1" xsi:type="NativeFixed" LENGTH="8"/>
      <FIELD ID="2" xsi:type="NCharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
     </RECORD>
     <ROW>
      <COLUMN SOURCE="1" NAME="ID" xsi:type="SQLBIGINT"/>
      <COLUMN SOURCE="2" NAME="Name" xsi:type="SQLNVARCHAR"/>
     </ROW>
    </BCPFORMAT>'
        , @NewFormatFileXml xml;
    
    EXEC dbo.usp_GetNewFormatFileXml
          @TableName = @TableName
        , @OriginalFormatFileXml = @OriginalFormatFileXml
        , @NewFormatFileXml = @NewFormatFileXml OUTPUT;
    
    SELECT @NewFormatFileXml;