Search code examples
vb.netcsvdatatableoledbdataadapter

Jet.OLEDB.4.0 Truncating String When importing from CSV


I have an windows form app that takes CSV/Excel files, lets users import them to a SQL DB.

Recently, I have had a weird bug where users upload a CSV file and it will truncate the text in a column.

Here is the source file:

Source File

Here is the DatagridView in my app, displaying the same data after it is converted to a datatable:

After Conversion

Notice some values are completely blank and highlighted in red, while other seem to be truncated. However, this only happens with CSV files, not excel. This leads me to believe it may be a driver issue.

Here is the code that converts the flat file data to a datatable:

Private Function ConvertCSVToDataTable(ByVal path As String) As DataTable
    Using con As OleDb.OleDbConnection = New OleDb.OleDbConnection()
        Try
            If System.IO.Path.GetExtension(path) = ".csv" Then
                con.ConnectionString = String.Format("Provider={0};Data Source={1};Extended Properties=""Text;HDR=YES;FMT=Delimited""", "Microsoft.Jet.OLEDB.4.0", IO.Path.GetDirectoryName(path))
                Using cmd As OleDb.OleDbCommand = New OleDb.OleDbCommand("SELECT * FROM [" & IO.Path.GetFileName(path) & "]", con)
                    Using da As OleDb.OleDbDataAdapter = New OleDb.OleDbDataAdapter(cmd)
                        con.Open()
                        da.Fill(dt)
                        con.Close()
                    End Using
                End Using
            ElseIf System.IO.Path.GetExtension(path) = ".xlsx" Then
                con.ConnectionString = String.Format("Provider={0};Data Source={1};Extended Properties=""Excel 12.0 Xml;HDR=Yes;IMEX=1""", "Microsoft.ACE.OLEDB.12.0", path)
                con.Open()
                Dim dbSchema As DataTable = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)
                Dim firstSheetname As String = dbSchema.Rows(0)("TABLE_NAME").ToString
                Using cmd As OleDb.OleDbCommand = New OleDb.OleDbCommand("SELECT * FROM [" & firstSheetname & "]", con)
                    Using da As OleDb.OleDbDataAdapter = New OleDb.OleDbDataAdapter(cmd)
                        'con.Open()
                        da.Fill(dt)
                        con.Close()
                    End Using
                End Using
            End If

        Catch ex As Exception
            MessageBox.Show(ex.ToString(), "Conversion Error", MessageBoxButtons.OK, MessageBoxIcon.Asterisk)
        Finally
            If con IsNot Nothing AndAlso con.State = ConnectionState.Open Then
                con.Close()
            End If
        End Try
    End Using
    Return dt
End Function

Any idea whats causing this? I should also note that some of my users can only import CSV and not Excel. I figured out users with a 32 bit Access driver could import excel files, while 64 bit users could not. I had them download the driver here:

https://www.microsoft.com/en-us/download/details.aspx?id=23734

Now those users can import Excel, but they still have the issue with the strings being truncated. Which leads me to believe it may still be a driver issue.

Test Data:

Sales Order #  
US00123  
US00123  
US00123  
SG0000123 
SG0000123 
S00123
S00123
S00123
S00123  
S00123  

Solution

  • I have been able to reproduce your issue.

    The problem is that for some reason, the Text File Driver used by the Jet provider is interpreting/parsing "S00123" as a numeric value. I had to configure the registry with MaxScanRows=0 and ImportMixedTypes="Majority Type".

    I do not know why this is happening, but I just attribute it to another one of the joys of using a poorly documented technology. It appears that any field that starts with a "S" followed by digits is interpreted as a number.

    If you insist on using this technolgy to accomplish a task for which there are far better options, then you have to live with its many shortcommings and quirks.

    Solution 1:

    Configure the provider to import with registry value ImportMixedTypes set to "Text". Now if you make this change to allow your program to work, you will also be responsible for breaking any other code that is reliant on the existing configuration.

    Location of the registry values on a 64 bit OS.

    For Jet Provider:

    Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Jet\4.0\Engines\Text

    Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Jet\4.0\Engines\Excel

    For ACE Provider:

    Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Text

    Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel

    Solution 2:

    Use a [Schema.ini]file https://learn.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver) to define how the text file is to be interpreted with defined columns.

    This is the preferred method for using this technology. Be advised that encoding of the file is very important; it must either in the .Net encoding System.Text.Encoding.ASCII or System.Text.Encoding.Unicode. If you use another encoding such as UTF-8, the file will not be read and the settings in the registry will be used.