Search code examples
excelsortingdelete-rowvba

Deleting entire row on criteria cannot handle 400,000 rows


I have this macro to delete the entire rows for those that are not "chr9". I have a total of 401,094 rows. It seems to compile fine, but my Excel freezes and I have to Force Quit.

I think it may be an inefficient algorithm or maybe some error in the code?

Sub deleteNonChr9()
    Dim lastrow As Long
    Dim firstrow As Long
    Dim i As Long

    lastrow = 401094
    firstrow = 0

    ' Increment bottom of sheet to upwards
    For i = lastrow To firstrow Step -1
        If (Range("C1").Offset(i, 0) <> "chr9") Then
            Range("C1").Offset(i, 0).EntireRow.Delete
        End If
    Next i

End Sub

Solution

  • The fastest way to conditionally delete rows is to have them all at the bottom of the data block. Sorting them into that position and deleting is faster than individual looping or even compiling a discontiguous Union of rows to delete.

    When any group or cells is contiguous (i.e. all together) Excel does not have to work as hard to get rid of them. If they are at the bottom of the Worksheet.UsedRange property, Excel doesn't have to calculate what to fill the empty space with.

    Your original code did not allow for a column header text label in row 1 but I will account for that. Modify to suit if you do not have one.

    These will turn off the three primary parasites of computing power. Two have already been addressed in the comments and answers, the third Application.EnableEvents property can also make a valid contribution to Sub procedure efficiency whether you have event driven routines or not. See the helper Sub procedure at the bottom for details.

    Sample data²: 500K rows of random data in A:Z. ~33% Chr9 in column C:C. Approximately 333K randomly discontiguous rows to delete.

    chr9_before

    Union and delete

    Option Explicit
    
    Sub deleteByUnion()
        Dim rw As Long, dels As Range
    
        On Error GoTo bm_Safe_Exit
        appTGGL bTGGL:=False          'disable parasitic environment
    
        With Worksheets("Sheet1")
            Set dels = .Cells(.Rows.Count, "C").End(xlUp).Offset(1)
            For rw = .Cells(.Rows.Count, "C").End(xlUp).Row To 2 Step -1
                If LCase$(.Cells(rw, "C").Value2) <> "chr9" Then
                    Set dels = Union(dels, .Cells(rw, "C"))
                End If
            Next rw
            If Not dels Is Nothing Then
                dels.EntireRow.Delete
            End If
        End With
    
    bm_Safe_Exit:
        appTGGL
    
    End Sub
    

    Elapsed time: <It has been 20 minutes... I'll update this when it finishes...>

    Bulk load from worksheet to variant array, change, load back, sort and delete

    Sub deleteByArrayAndSort()
        Dim v As Long, vals As Variant
    
        On Error GoTo bm_Safe_Exit
        appTGGL bTGGL:=False          'disable parasitic environment
    
        With Worksheets("Sheet1")
            With .Cells(1, 1).CurrentRegion
                .EntireRow.Hidden = False
                With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
                   'bulk load column C values
                    vals = .Columns(3).Value2
    
                   'change non-Chr9 values into vbNullStrings
                    For v = LBound(vals, 1) To UBound(vals, 1)
                        If LCase$(vals(v, 1)) <> "chr9" Then _
                          vals(v, 1) = vbNullString
                    Next v
    
                End With
    
               'dump revised array back into column C
                .Cells(2, "C").Resize(UBound(vals, 1), UBound(vals, 2)) = vals
    
                'sort all of blank C's to the bottom
                .Cells.Sort Key1:=.Columns(3), Order1:=xlAscending, _
                                   Orientation:=xlTopToBottom, Header:=xlYes
    
                'delete non-Chr9 contiguous rows at bottom of currentregion
                .Range(.Cells(.Rows.Count, "C").End(xlUp), .Cells(.Rows.Count, "C")).EntireRow.Delete
    
            End With
            .UsedRange   'reset the last_cell property
        End With
    
    bm_Safe_Exit:
        appTGGL
    
    End Sub
    

    Elapsed time: 11.61 seconds¹
           (166,262 rows of data remaining²)

    Original code

    Elapsed time: <still waiting...>

    Summary

    There are obvious advantages to working within a variant array as well as deleting contiguous ranges. My sample data had ~66% of the rows to delete so it was a harsh task master. If there were 5 or 20 rows to delete, using an array to parse data for a sort may not be the best solution. You will have to make your own decisions based on your own data.

    chr9 after

    appTGGL helper Sub procedure

    Public Sub appTGGL(Optional bTGGL As Boolean = True)
        With Application
            .ScreenUpdating = bTGGL
            .EnableEvents = bTGGL
            .Calculation = IIf(bTGGL, xlCalculationAutomatic, xlCalculationManual)
        End With
        Debug.Print Timer
    End Sub
    

    ¹ Environment: old business class laptop with a mobile i5 and 8gbs of DRAM running WIN7 and Office 2013 (version 15.0.4805.1001 MSO 15.0.4815.1000 32-bit) - typical of the low end of the scale for performing this level of procedure.

    ² Sample data temporarily available at Deleting entire row cannot handle 400,000 rows.xlsb.