Search code examples
excelvbatextnumbersextract

Extract 2 numbers from text (bulk list)


Background info:

I have a listing of 5000+ error messages in this format:

"999999 16 901 F SMITH, Smith FT 1 1.0 Additional Leave hours -4.0000 exceed entitlement plus pro-rata -4.0000"

I have been able to categorise them using a macro, so "Additional Leave hours exceed entitlement plus pro-rata" for example.

From there I'm trying to extract the two numbers.

I can do it manually using these formulas:

=MID(J3,SEARCH("hours ",J3)+5,SEARCH("exceed",J3)-SEARCH("hours ",J3)-6)
   
=TRIM(RIGHT(SUBSTITUTE(J3," ",REPT(" ",LEN(J3))),LEN((J3))))

But this is where i'm stuck, incorporating that logic in the macro and having it loop through the full list.

This was my first attempt:

If InStr(myString, "Additional Leave hours ") > 0 And InStr(myString, "exceed entitlement plus pro-rata") Then

'set category
Cells(x, 6).Value = "Additional Leave hours exceed entitlement plus pro-rata"

'first number
Cells(x, 8).ForumlaR1C1 = "=MID(RC[2],SEARCH(""hours "",RC[2])+5,SEARCH(""exceed"",RC[2])-SEARCH(""hours "",RC[2])-6"

'second number
Cells(x, 9).FormulaR1C1 = "=TRIM(RIGHT(SUBSTITUTE(RC[2],"" "",REPT("" "",LEN(RC[2]))),LEN((RC[2]))))"

'first minus second
Cells(x, 7).FormulaR1C1 = "=SUM(RC[2]-RC[1]"
    
End If

From there I have been able to use .Select & .Active cell, and it works but not efficiently:

'first number
Cells(x, 8).Select
        
ActiveCell.FormulaR1C1 = "=MID(RC[2],SEARCH(""hours"",RC[2])+5,SEARCH(""exceed"",RC[2])SEARCH(""hours "",RC[2])-6)"

Any help would be appreciated, thanks in advance.


Solution

  • Idea is to process all the strings in an array (so that it's faster, compared to writing/reading cells 1 by 1), use RegExp to extract the 2 numbers into an array which will be used to paste into the previous 2 columns. Finally insert the SUM formula into the column before:

    Sub Test()
        Const inputStartRow As Long = 1
        Const inputCol As String = "J"
        Const regexPattern As String = "Additional Leave hours ([-\d.]{1,}) exceed entitlement plus pro-rata ([-\d.]{1,})"
        
        Dim ws As Worksheet
        Set ws = ThisWorkbook.Worksheets("Sheet1") 'Change name accordingly
        
        Dim regex As Object
        Set regex = CreateObject("VBScript.RegExp")
        With regex
            .Pattern = regexPattern
            .Global = False
        End With
        
        '==== Get last row of the input column and set to range
        Dim inputLastRow As Long
        inputLastRow = ws.Cells(ws.Rows.Count, inputCol).End(xlUp).Row
            
        Dim inputRng As Range
        Set inputRng = ws.Range(ws.Cells(inputStartRow, inputCol), ws.Cells(inputLastRow, inputCol))
        
        '==== Populate the array with the input range's value
        Dim inputArr As Variant
        inputArr = inputRng.Value
            
        Dim outputArr() As String
        ReDim outputArr(1 To UBound(inputArr, 1), 1 To 2) As String
        
        '==== Loop through the array and extract the 2 numbers
        Dim i As Long
        For i = 1 To UBound(inputArr, 1)
            If InStr(inputArr(i, 1), "Additional Leave hours ") > 0 And InStr(inputArr(i, 1), "exceed entitlement plus pro-rata") Then
                If regex.Test(inputArr(i, 1)) Then
                    Dim regexMatch As Object
                    Set regexMatch = regex.Execute(inputArr(i, 1))(0)
                                    
                    outputArr(i, 1) = regexMatch.SubMatches(0)
                    outputArr(i, 2) = regexMatch.SubMatches(1)
                End If
            End If
        Next i
        
        '==== Insert the extraction @ Input column - 1/ -2
        Dim outputRng As Range
        Set outputRng = inputRng.Offset(, -2).Resize(, 2)
        outputRng.Value = outputArr
        
        Set outputRng = Nothing
        
        '==== Add in SUM formula @ Input Column - 3
        Dim sumRng As Range
        Set sumRng = inputRng.Offset(, -3)
        sumRng.Formula = "=SUM(" & ws.Cells(inputStartRow, sumRng.Column + 1).Address(RowAbsolute:=False) & "-" & ws.Cells(inputStartRow, sumRng.Column + 2).Address(RowAbsolute:=False) & ")"
        
        Set sumRng = Nothing
    End Sub