Search code examples
vb.netvisual-studio-2010csvoledb

OLEDB Import of CSV to VB.NET datatable reading '-' as 0


Greetings helpful delightful people,

I have a problem with reading CSV files and converting to datatables in VB.Net. If the CSV file contains a column full of '-' then on import into the datatable they appear as '0' and the entire column is formatted in a numeric format.

The code I have written is:

Public Function LoadCsvFile(filePath As String) As DataTable
    Try

        DTReadCSV = New DataTable

        Dim connection As System.Data.OleDb.OleDbConnection = New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & Microsoft.VisualBasic.Left(filePath, InStrRev(filePath, "\")) & ";Extended Properties=""text;HDR=Yes;FMT=Delimited""")

        Dim adapter As New OleDb.OleDbDataAdapter("SELECT * FROM [" + Microsoft.VisualBasic.Mid(filePath, InStrRev(filePath, "\") + 1) + "]", connection)
        'Dim table As New DataTable()
        adapter.Fill(DTReadCSV)

        'now thats its nicely in a datatable 
        IntAmountRows = DTReadCSV.Rows.Count
        IntAmountColumns = DTReadCSV.Columns.Count


        'System.Diagnostics.Debug.Print(DTReadCSV.Rows.Item(1)(1).ToString)
        Return DTReadCSV

        Exit Function
    Catch ex As Exception
        MsgBox(ex.ToString())

        MsgBox(Err.Number & " " & Chr(13) & Err.Description)
    End Try
End Function

Please can someone smarter figure out how to combat this issue besides modifying the CSV file by taking out the '-' as blanks, which at the moment seems the only long winded way of importing these CSV files.

Many thanks


Solution

  • Using a Schema.INI you can describe to OleDB what the CSV looks like in detail including resulting column names and data types. For instance, given this data:

    "Country","Capital City","Population", "Fake"
    "France","Paris","2.25","-----"
    "Canada","Toronto","2.5","-----"
    "Italy","Rome","2.8","-----"

    Create a Schema.ini file in the same folder; it can have multiple sections to define various CSVs in that folder. If there is a Schema.INI in the same folder as the CSV and it has an entry for your CSV, OleDB will automatically use it (nothing special to do).

    The Schema.INI entry for the above data may look like this:

    [Capitals.CSV]
    ColNameHeader=True
    CharacterSet=1252
    Format=CSVDelimited
    TextDelimiter="
    Col1="Country" Text Width 254
    Col2="Capital City" Text Width 254
    Col3="Population" Single
    Col4="Fake" Text Width 254

    OleDb will use these definitions when reading that file, resulting in a string of dashes for the "Fake" column in the output:

    enter image description here

    One additional benefit (among many) of using a Schema.INI is that you can name the columns there rather than using or aliasing F1, F2, F3 etc. The SQL for that output was just "SELECT * FROM Capitals.CSV"

    Tip (for others to come): to specify UTF8 as the character set, use CharacterSet=65001 in the schema.

    For more information see
    - Schema.ini Text File Driver
    - Code Page Identifiers