I'd like to get fresh advice on the infamous issue caused by the Byte Order Mark ( or EF BB BF in hex) when trying to read UTF-8 encoded CSVs with VBA (Excel). Note that I'd like to avoid opening the CSV with Workbooks.Open or FileSystemObject. Actually, I'd rather use an adodb.RecordSet as I need to execute some kind of SQL queries.
After having read many (many!) things, I believe the 4 best solutions to deal with this specific issue are:
The last solution (using a stream) seems quite fine but doing the following returns a string:
Sub loadCsv()
Const adModeReadWrite As Integer = 3
With CreateObject("ADODB.Stream")
.Charset = "utf-8"
.Mode = adModeReadWrite
.Open
.LoadFromFile ("C:\atestpath\test.csv")
Debug.Print .readtext
End With
End Sub
Do you know any trick that could help using the string returned by .readtext as the Data Source of an ADODB.RecordSet or ADODB.Connection (apart from looping to manually populate the fields of my recordset)?
EDIT: I found that loading the CSV with a querytable object (see this good example) or through a WorkbookQuery object (introduced in Excel 2016) are the easiest and probably most reliable ways to proceed (see an example from the documentation here).
OLD ANSWER:
Talking with @Profex encouraged me to further investigate the issue. Turns out there are 2 problems: the BOM and the delimiter used for the CSV. The ADO connection string I need to use is :
strCon = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\test\;Extended Properties='text;HDR=YES;CharacterSet=65001;FMT=Delimited(;)'"
But FMT does not work with a semicolon (FMT=Delimited(;)
), at least with Microsoft.ACE.OLEDB.12.0 on a x64 system (Excel x64). Thus, @Profex was quite right to state:
even though the first field name has a ? in front of it, it doesn't look like it actually matters
given that he was using FMT=Delimited
on a CSV delimited by a simple comma (",").
Some people suggest to edit the registry so that the semicolon delimiter is accepted. I'd like to avoid that. Also, I'd rather not create a schema.ini file (even if that may be the best solution for complex CSVs). Thus, the only solutions remaining require to edit the CSV before creating the ADODB.Connection.
I know my CSV will always have the problematical BOM as well as the same basic structure (something like "date";"count"). Thus I decided to go with this code:
Dim arrByte() As Byte
Dim strFilename As String
Dim iFile As Integer
Dim strBuffer As String
strFilename = "C:\Users\test\t1.csv"
If Dir(strFilename) <> "" Then 'check if the file exists, because if not, it would be created when it is opened for Binary mode.
iFile = FreeFile
Open strFilename For Binary Access Read Write As #iFile
strBuffer = String(3, " ") 'We know the BOM has a length of 3
Get #iFile, , strBuffer
If strBuffer = "" 'Check if the BOM is there
strBuffer = String(LOF(iFile) - 3, " ")
Get #iFile, , strBuffer 'the current read position is ok because we already used a Get. We store the whole content of the file without the BOM in strBuffer
arrByte = Replace(strBuffer, ";", ",") 'We replace every semicolon by a colon
Put #iFile, 1, arrByte
End If
Close #iFile
End If
(note: one might use arrByte = StrConv(Replace(strBuffer, ";", ","), vbFromUnicode) because the bytes array is in ANSI format).