Search code examples
vbams-accessado

ADO in VBA to read txt/csv file but get lesser columns than expected


I'm using ADO txt/csv connection string below in vba to get data without me actually opening excel.

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FolderPath & ";Extended Properties="text;HDR=Yes;FMT=Delimited(,)";  

Dim rs As New ADODB.Recordset
query = "Select * from  [" & CStr(FileNameArr(i)) & "]"
rs.Open query, conn, adOpenKeyset, adLockOptimistic

I've successfully opened the connection and recordset, but the recordset only contained two fields which is not correct.(Should be 103). The strangest thing is that if I open file and save it as csv or txt without doing anything, and run vba again, it successfully gets all columns! The file is not actually a txt/csv file, it's a .msr file. But I can open it with Notepad or excel without any problem. Before executing vba, I deleted extension of .msr and change it to .csv, then execute VBA using ADO csv/txt connection string to pull out data, some columns will be missing as above statement.

The file content is not a strictly proper table, it has lesser header columns than that of data body. If I open it in csv, data in the first row only lives in Column A, but data in other rows contains at most 103 columns.

I also tried HDR=No but it didn't work.

I've tried several workarounds:
<1>.Try to save file via ADO:
Because I know manually open file and save it will solve the problem, I tried to add some text in first row then save file when recordset is opening.
I tried rs.field(2)="New Text" then rs.Update,
but an error message occurred at rs.field(2)="New Text" and error message popped up:
run time error 3251 current recordset does not support updating by this ISAM
I thought it's because the header only contains 2 columns, i.e field(0) and field(1), so I can't update something that is originally not existed there. So, I changed to rs.field(1)="New Text", but still same error message.
If I skip to rs.Update, it's ok. But I still got only two columns.
Thus, I tried rs.save Original file full path, adPersistADTG
This gives runtime error 58: file already exists.
If I cahnge to a non-existed file name, same error occurred.

<2>.try specify 3rd and 4th argument of rs.open
By googling, I found some said that if I want to change file content via ADO, i need to use

rs.Open query, conn, adOpenKeyset, adLockOptimistic

Some said rs.Open query should change to table_name
in this thread answered Marcelo Garzzola:
run time error 3251 current recordset does not support updating

rs.Open table_name, conn, adOpenKeyset, adLockOptimistic

But none of this work for me.

<3>.Try to open it in xls, xlsx, xlsm file extension
This approach failed early while opening connection, it showed:
Database error 0x80004005: External table is not in the expected format.
I use a sub-procedure in vba to open connection and recordset to the file

Sub OpenConnection(wb As Workbook, addr As String, FileExtensionName As String, IsUseNumberSequenceAsSheetName As Boolean, FullPathArr As Variant, FolderPath As String, FileNameArr As Variant)  
Dim conn As New ADODB.Connection
Dim sht As Worksheet
For i = 1 To UBound(FileNameArr) - LBound(FileNameArr) + 1
    Dim connStr As String: connStr = ADOstr(CStr(FullPathArr(i)), FolderPath, FileExtensionName)
    conn.Open connStr
    Dim query As String
    query = "Select * from  [" & CStr(FileNameArr(i)) & "]"
    Dim rs As New ADODB.Recordset
    rs.Open query, conn, adOpenKeyset, adLockOptimistic
    Dim asht As Worksheet: Set asht = wb.Worksheets.Add(, After:=ActiveSheet)
    If IsUseNumberSequenceAsSheetName Then
        asht.Name = i
    End If
    ReDim Header(0 To rs.Fields.Count - 1)
    With asht
        For h = 0 To rs.Fields.Count - 1
            Header(h) = rs.Fields(h).Name
        Next
        .Range(addr).Resize(1, rs.Fields.Count) = Header
        .Range(addr).Offset(1, 0).CopyFromRecordset rs
    End With
    rs.Close
    conn.Close
Next
End Sub

Above sub-procedure will call a ADOstr function:

Function ADOstr(FullPath As String, FolderPath As String, FileExtension As String) As String
Select Case True
    Case LCase(FileExtension) = "csv" Or LCase(FileExtension) = "txt"
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FolderPath & ";Persist Security Info=False;Extended Properties=""text;HDR=Yes;IMEX=0;FMT=Delimited(;)"";"
    Case LCase(FileExtension) = "xls"
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & ";Extended Properties=""Excel 8.0;HDR=No;IMEX=0"";"
    Case LCase(FileExtension) = "xlsx" 
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & ";Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1"";"
    Case LCase(FileExtension) = "xlsb"
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & "Extended Properties=""Excel 12.0;HDR=YES"";"
    Case LCase(FileExtension) = "xlsm" 
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & ";Extended Properties=""Excel 12.0 Macro;HDR=YES;IMEX=1"";"
End Select
End Function

I will call OpenConnection sub-procedure in my main sub-procedure and define all parameters it needs, including file extension so that ADOstr function can return corresponding connection strings.

<4>.Try to replicate error in small csv file
I created a small data set in a csv file which contains header row that only A1 has value
and data body that have 5 columns.
It turns out ADO recordset exactly return correct 5 fields! So I guess maybe there's something strange in this .msr file which I can't see.
Due to proprietary concern, I can't upload the file.
But I could say that header row contains >ver MF01 00.00 in A1 cell
After reading from recordset, it becomes: >ver MF01 00#00 in A1 Cell, F2 in B1 Cell

<5> Try to open connection using strings with all kinds of delimiters
I've tried to change FMT=Delimited(;) to FMT=Delimited(,) to FMT=Delimited(@) to FMT=Delimited( ), I still get only two columns.
Google (below link) said in csv/txt format, I need a schema.ini file in the same folder, where stores the delimiters that program will actually use. And I tried, but the same. It still returns only two columns/fields no matter what delimiters I used.
ADODB.Connection: delimiter semicolon does not work for csv text files
But schema.ini file did work well with my small csv file I've created for debugging. It did separate columns by delimiter in schema.ini.

I would be very much appreciate if anyone could guide me
how to save a file via ADO or
how to open csv file without any delimiters such that all data live in ColumnA so recordset could retreive all data
or other workarounds.


Solution

  • First see this, I've encountered a very, very similar issue: Missing column of Dates in CSV file using Schema.ini file

    You want a standardized file format like CSV. If you do a SELECT * on a free text file it has no structure or schema.

    Try again on a CSV file with these properties set:

    Extended Properties="text;HDR=Yes;FMT=Delimited(Tab)";
    

    Note: if the size of the MSR file as a CSV is larger don't worry too much as using the OleDB Driver is one of the fastest for .Net and it can read a 128 MB file in 13.5 seconds: https://stackoverflow.com/a/46418403/495455