Search code examples
vbaexcelvb6

Performance issues with select query for a large CSV file


I'm trying to search for data in a very large CSV file (one million records) using ADO SELECT query and I have a few WHERE clauses in that query.

I cannot transfer this data to any DB (MySql or SQL Server or MS Access) because it is generated daily and I can't transfer it daily to the DB.

I do not have any row id in this .csv file. If there is a row id generated for every .csv by default then please let me know.

Here is the CSV file data sample (first field is date, second is time, third is a value):

CSV FILE SAMPLE DATA
====================
20130714,170056,1.30764
20130714,170122,1.30743
20130714,170132,1.30744
20130714,170205,1.30743
20130714,170214,1.30744
20130714,170216,1.30743
20130714,170244,1.30744
20130714,170325,1.30744
20130714,170325,1.30743
20130714,170325,1.30743
20130714,170325,1.30742
20130714,170504,1.30741
20130714,170519,1.30741
20130714,170519,1.30739
20130714,170522,1.30739
20130714,170522,1.30732
20130714,170522,1.30722

All the CSV records are in order by date and time.

I'm using ADO connection from Excel to CSV file with this source code:

strsql = "SELECT * FROM " & sItem & ".csv WHERE F3>=" & trigPrice & " AND (F1 in (SELECT distinct TOP " & trigWin & "  f1 FROM " & sItem & ".csv WHERE (F1>=" & sDay & ")) AND f2>=" & sTime & ")"
Set rs = cn.Execute(strsql)

This one query takes about 10 minutes to execute. How do I reduce the execution time?


Solution

  • The reason that database queries can be fast is that the data is already indexed - that is, it will have quick lookups on some of the fields. When you run a "query" on raw CSV files, the ADO engine must first parse the text into records into a set of records, then search through them row by row to find records that match your search criteria. If you are planning on doing much more than a few queries on the data, you might as well import it into an indexed database table and avoid the duplication of parsing the CSV multiple times.

    UPDATE

    To import the CSV file from VBA, you can use the 'DoCmd.TransferText' function. For example, to import a CSV file to a table (with the correct layout) called "tblData" and from a comma seperated CSV with headers, you could do the following:

    DoCmd.TransferText acImportDelim, , "tblData", "C:\Path\OF\THE.csv", True
    

    This is the same method used by the Access Import Wizard.