Search code examples
sql-servert-sqlssisbiml

Problem in importing CSV file with column containig NULL values or comma


I have a CSV file with the two columns :

Employee_Name,EmpID
"Harry, Watson",1
"Amy, Black",2
"O'hare, Lynn",3
Jeremy Prater,4
,,
,,

I am using biml to generate my package :

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
        <FlatFileFormat Name="FlatFile" CodePage="1252" TextQualifer="&quot;"
        ColumnNamesInFirstDataRow="true" IsUnicode="false">            
            <Columns>
                <Column Name="Employee_Name" DataType="AnsiString"  Length="255" Delimiter="Comma" />
                <Column Name="EmpID" DataType="Int64" Delimiter="Comma" />


            </Columns>
        </FlatFileFormat>
</FileFormats>

<Connections>
    <FlatFileConnection Name="importexcel"
        FilePath="HR.csv"
        FileFormat="FlatFile" />
    <Connection Name="AppSTG" 
        ConnectionString="XXXX"></Connection>
</Connections>


<Databases>

    <Database Name="STG_App" ConnectionName="AppSTG"></Database>

</Databases>


<Schemas>
    <Schema Name="HR" DatabaseName="STG_App"></Schema>
</Schemas>    

<Tables>
    <Table Name="Employee" SchemaName="STG_App.HR">
            <Columns>
               <Column Name="Employee_Name" DataType="AnsiString" Length="255" />
                <Column Name="EmpID" DataType="Int64"  />
        </Columns>
        </Table>

</Tables>    
<Packages>
        <Package Name="Load Flat File Data" >
            <Tasks>
                <Dataflow Name="Load Flat File Data">
                    <Transformations>
                        <FlatFileSource ConnectionName="importexcel" Name="FlatFile"/>
                        <OleDbDestination Name="Target" ConnectionName="AppSTG">
                            <TableOutput TableName="STG_App.HR.Employee" />
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>




</Biml>

To create the following data flow :

enter image description here

When I try to execute the package, I have the following error :

« Data conversion failed. The data conversion for column "EmpID" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".  »


Solution

  • You have defined your EmpID field as being Int64 which will work great when you have a digits there but in the case of no data (but a row still being present), SSIS is going to try to convert the empty string to a number and that will fail.

    If you add an error pathway from the Flat File Source for truncation/error/etc you'd see rows 5+ going down that path. For this data, I'd define everything as string as you need to get the data into the pipeline and then you'll need to take action on it based on whatever business rules make sense (no name/id, trash it).

    As @alex points out in the comment, the final rows indicate there are three columns of data whereas you've defined two so when the flat file source gets to that, you'll blow up. SSIS won't be able to handle inconsistent file formats like that.