I have a .txt with around 6gb of data. Fields delimited by semicolons.
I need to check one of the fields line by line against a prebuild dictionary and if there is a match copy all the fields of the respective line into a 2 dimension array.
Currently this is the relevant part of code (declarations and functions ommitted. Not at the scope of this question):
Set hbDict = dict_HB(hb) ''--this returns a dictionary from a function for comparison
Set FSO = CreateObject("scripting.filesystemobject")
Set myFile = FSO.OpenTextFile(sPath & sFilename, ForReading)
'--This counts how many matches occur between txt and dictionary to redim the array:
Do While myFile.AtEndOfStream <> True
textline = myFile.ReadLine
arrLine = Split(textline, ";")
If hbDict.exists(arrLine(3)) Then
arrLimit = arrLimit + 1
End If
Loop
Redim MyArray(1 to arrLimit, 1 to 31)
'--Loop again through the file, now actually adding to the redimmed array:
L = 1
Do While myFile.AtEndOfStream <> True
textline = myFile.ReadLine
arrLine = Split(textline, ";")
If hbDict.exists(arrLine(3)) Then
For c = 1 to 31
MyArray(L,C) = arrLine(c-1)
Next
L = L + 1
End If
Loop
myFile.Close
set FSO = nothing
'code continues...
First loop takes around 19 minutes. second a little more.
Already tried to open for append, but it crashes, maybe because i'm running at 4gb of RAM. Any way of loading the entire file at once, seems to crash the machine. Open for input does not read the entire file, so data is lost. The use of a collection in the first loop to avoid relooping the txt would be great if it could handle more than 256 entries... And off course, dinamically redim array inside the loop is out of question since it is a performance killer.
Is there any way to do it faster than this?
Change the first loop to
Dim colLines As Collection
Set colLines = New Collection
Do While Not myFile.AtEndOfStream
textline = myFile.ReadLine
arrLine = Split(textline, ";")
If hbDict.exists(arrLine(3)) Then
'arrLimit = arrLimit + 1
colLines.Add textline
End If
Loop
And the second loop
Dim i As Long
ReDim MyArray(1 To colLines.Count, 1 To 31)
For i = 1 To colLines.Count
textline = colLines(i)
arrLine = Split(textline, ";")
If hbDict.exists(arrLine(3)) Then
For c = 1 To 31
MyArray(L, c) = arrLine(c - 1)
Next
L = L + 1
End If
Next i
In such a way you only need to read the text file once. Because it's so big you won't be able to read the file completely into memory.