Search code examples
regexexcelvbaexcel-addins

How to split a string in VBA to array by Split function delimited by Regular Expression


I am writing an Excel Add In to read a text file, extract values and write them to an Excel file. I need to split a line, delimited by one or more white spaces and store it in the form of array, from which I want to extract desired values.

I am trying to implement something like this:

arrStr = Split(line, "/^\s*/")

But the editor is throwing an error while compiling.

How can I do what I want?


Solution

  • If you are looking for the Regular Expressions route, then you could do something like this:

    Dim line As String, arrStr, i As Long
    line = "This is a  test"
    
    With New RegExp
        .Pattern = "\S+"
        .Global = True
        If .test(line) Then
            With .Execute(line)
                ReDim arrStr(.Count - 1)
                For i = 0 To .Count - 1
                    arrStr(i) = .Item(i)
                Next
            End With
        End If
    End With
    

    IMPORTANT: You will need to create a reference to:
    Microsoft VBScript Regular Expressions 5.5 in Tools > References
    Otherwise, you can see Late Binding below

    Your original implementation of your original pattern \^S*\$ had some issues:

    • S* was actually matching a literal uppercase S, not the whitespace character you were looking for - because it was not escaped.
      • Even if it was escaped, you would have matched every string that you used because of your quantifier: * means to match zero or more of \S. You were probably looking for the + quantifier (one or more of).
      • You were good for making it greedy (not using *?) since you were wanting to consume as much as possible.

    The Pattern I used: (\S+) is placed in a capturing group (...) that will capture all cases of \S+ (all characters that are NOT a white space, + one or more times.

    I also used the .Global so you will continue matching after the first match.

    Once you have captured all your words, you can then loop through the match collection and place them into an array.


    Late Binding:

    Dim line As String, arrStr, i As Long
    line = "This is a  test"
    
    With CreateObject("VBScript.RegExp")
        .Pattern = "\S+"
        .Global = True
        If .test(line) Then
            With .Execute(line)
                ReDim arrStr(.Count - 1)
                For i = 0 To .Count - 1
                    arrStr(i) = .Item(i)
                Next
            End With
        End If
    End With
    

    Miscellaneous Notes

    I would have advised just to use Split(), but you stated that there were cases where more than one consecutive space may have been an issue. If this wasn't the case, you wouldn't need regex at all, something like:

    arrStr = Split(line)
    

    Would have split on every occurance of a space