Search code examples
vbaexceldictionarymultidimensional-arrayfso

Faster way to dump immense text into array


I have a .txt with around 6gb of data. Fields delimited by semicolons.

I need to check one of the fields line by line against a prebuild dictionary and if there is a match copy all the fields of the respective line into a 2 dimension array.

Currently this is the relevant part of code (declarations and functions ommitted. Not at the scope of this question):

Set hbDict = dict_HB(hb) ''--this returns a dictionary from a function for comparison

Set FSO = CreateObject("scripting.filesystemobject")
Set myFile = FSO.OpenTextFile(sPath & sFilename, ForReading)

'--This counts how many matches occur between txt and dictionary to redim the array:
Do While myFile.AtEndOfStream <> True
    textline = myFile.ReadLine
    arrLine = Split(textline, ";")
    If hbDict.exists(arrLine(3)) Then
        arrLimit = arrLimit + 1
    End If
Loop

Redim MyArray(1 to arrLimit, 1 to 31)

'--Loop again through the file, now actually adding to the redimmed array:
L = 1
Do While myFile.AtEndOfStream <> True
    textline = myFile.ReadLine
    arrLine = Split(textline, ";")
    If hbDict.exists(arrLine(3)) Then
        For c = 1 to 31
            MyArray(L,C) = arrLine(c-1)
        Next
        L = L + 1
    End If
Loop
myFile.Close
set FSO = nothing

'code continues...

First loop takes around 19 minutes. second a little more.

Already tried to open for append, but it crashes, maybe because i'm running at 4gb of RAM. Any way of loading the entire file at once, seems to crash the machine. Open for input does not read the entire file, so data is lost. The use of a collection in the first loop to avoid relooping the txt would be great if it could handle more than 256 entries... And off course, dinamically redim array inside the loop is out of question since it is a performance killer.

Is there any way to do it faster than this?


Solution

  • Change the first loop to

    Dim colLines As Collection
    Set colLines = New Collection
        Do While Not myFile.AtEndOfStream
            textline = myFile.ReadLine
            arrLine = Split(textline, ";")
            If hbDict.exists(arrLine(3)) Then
                'arrLimit = arrLimit + 1
                colLines.Add textline
            End If
        Loop
    

    And the second loop

    Dim i As Long
    ReDim MyArray(1 To colLines.Count, 1 To 31)
    
    For i = 1 To colLines.Count
        textline = colLines(i)
        arrLine = Split(textline, ";")
        If hbDict.exists(arrLine(3)) Then
            For c = 1 To 31
                MyArray(L, c) = arrLine(c - 1)
            Next
            L = L + 1
        End If
    Next i
    

    In such a way you only need to read the text file once. Because it's so big you won't be able to read the file completely into memory.