Search code examples
excelvbatextsplitexport-to-excel

3 million lines of port data in Excel from txt


I need from a text file with the 3 million lines of port data in Excel, every 1 million lines portation to start a new list. What's wrong with the code, tell me?

    Sub ReadFile()
  Dim I, J
  I = 1
  J = 1
  File = "\\bla.bla.bla\Desktop\vsr.txt"
  Open File For Input As #1
  Do While Not EOF(1)
    Line Input #1, MyString
    Sheets("LIST" & J).Cells(I, 1) = MyString
    If I = 1000000 Then
      I = 1
      J = J + 1
    Else
      I = I + 1
    End If
  Loop
  Close #1
End Sub

Solution

  • @ashleedawg already wrote most of the important stuff, hover ever I would make another correction to the code.

    The way you handle the data now means you look up line in the vsr.txt file, then copy it to a string and finally write it into a cell. This takes up a lot of processing power and will be quite slow (especially the part where you access the cells. It would be way more efficient to copy all the text into a array and then paste it all at once.

    The WorksheetFunction.Transpose() is necessary since arrays in excel use Columns as the first dimensions.
    Inserting the Code every 10'000 Lines prevents the Transpose function from running into Issues. I assume since it is part of the WorksheetFunction Code it may relay on some older code or even be specifically made this way to run lagacy code (lenght of the return value must be array of lenght Integer or something).
    Using myWb is there to make the Range modifiers fully qualified, not really necessary but can save you a lot of trouble down the line.
    With GROUPSIZE you can adjust the size of the Array-Block you want to import at once, a bigger number may give you faster processing speeds.

    Option Explicit
    
    Sub ReadFile()
    
        Const GROUPSIZE As Long = 10000 'ENTRIESPERPAGE must be divisible by this without rest to fit exactly 
        Const ENTRIESPERPAGE As Long = 1000000 
        Const fName As String = "\\bla.bla.bla\Desktop\vsr.txt"
    
        Dim rowNum As Long
        Dim rowOffset As Long
        Dim shtNum As Long
        Dim myString(1 To GROUPSIZE) As String
        Dim myWB As Workbook
        Dim Range2Fill As String
    
        Set myWB = ThisWorkbook
    
        Open fName For Input As #1
        shtNum = 1
        rowNum = 0
        rowOffset = 0
        Do While Not EOF(1)
            rowNum = rowNum + 1
            Line Input #1, myString(rowNum)
    
            If rowNum = GROUPSIZE Then
                Range2Fill = "A" & rowOffset + 1 & ":A" & rowOffset + rowNum
                myWB.Worksheets(shtNum).Range(Range2Fill) = WorksheetFunction.Transpose(myString)
                Debug.Print "Sheet: " & shtNum, "Offset: " & rowOffset
    
                If rowOffset >= ENTRIESPERPAGE - rowNum Then
                    rowOffset = 0
                    shtNum = shtNum + 1
                Else
                    rowOffset = rowOffset + GROUPSIZE
                End If
    
                rowNum = 0
            End If
        Loop
    
        'writes the last set of data in case there are not an exact multiple of 1M values
        Range2Fill = "A" & rowOffset + 1 & ":A" & rowOffset + rowNum
        myWB.Worksheets(shtNum).Range(Range2Fill) = WorksheetFunction.Transpose(myString)
        Close #1
    
    End Sub
    

    Additional notes:

    • Excel supports up to 1,048,576 Rows per worksheet, but expect performance to drop drastically on so many datapoints, especially if you start calculating them or using them in graphs.
    • There are better options for working on huge datasets like this, Origin, MatLab, and DIAdem are a few that I know of and have used so far, a free phyton based alternative would be Spyder.