Search code examples
excelvbams-accessweb-scrapingformat

How to format strings with numbers and special characters in Excel or Access using VBA?


I have a mathematical problem: these five strings are IDs for the same object. Due to these differences, objects appear multiple times in my Access table/query. Although there are a lot of these mutations, but I take this as an example.

76 K 6-18
76 K 6-18(2)
0076 K 0006/ 2018
0076 K 0006/2018
76 K 6/18

How would the VBA-code have to look like to recognize that these numbers stand for the same thing , so a general formatting with "RegEx()" or "format()" or "replace()"...but they must not only refer to this example but to the kind.

The common factor of these and all other mutations is always the following:

1) includes "-", no zeros left of "-", just 18 an not 2018 (year) at the end. 2) is like the first but with (2) (which can be dropped). 3) includes "/", zeros left of "/", and 2018 as year at the end. 4) is like third, but without space after "/". 5) is like the first one, but with a "/" instead of "-".

Character is always one single "K"! I suppose the best way would be to convert all 5 strings to 76 K 6 18 or in ohter cases for example to 1 K 21 20 or 123 K 117 20 . Is this possible with one elegant code or formula? Thanks


Solution

  • Here is a fun alternative using a rather complex but intuitive regular expression:

    ^0*(\d+) (K) 0*(\d+)[-\/] ?\d{0,2}(\d\d)(?:\(\d+\))?$
    

    See an online demo

    • ^ - Start line anchor.
    • 0* - 0+ zeros to catch any possible leading zeros.
    • (\d+) - A 1st capture group of 1+ digits ranging 0-9.
    • - A space character.
    • (K) - 2nd Capture group capturing the literal "K".
    • - A space character.
    • (\d+) - A 3rd capture group of 1+ digits ranging 0-9.
    • [-\/] - Character class of either a hyphen or forward slash.
    • ? - An optional space character.
    • \d{0,2} - 0-2 digits ranging from 0-9.
    • (\d\d) - A 4th capture group holding exactly two digits.
    • (?:\(\d+\))? - An optional non-capture group holding 1+ digits inside literal paranthesis.
    • $ - End line anchor.

    Now just replace the whole string by the 4 capture groups with spaces in between.


    Let's test this in VBA:

    'A code-block to call the function.
    Sub Test()
    
        Dim arr As Variant: arr = Array("76 K 6-18", "76 K 6-18(2)", "0076 K 0006/ 2018", "0076 K 0006/2018", "76 K 6/18")
    
        For x = LBound(arr) To UBound(arr)
            Debug.Print Transform(CStr(arr(x)))
        Next
    
    End Sub
    
    'The function that transform the input.
    Function Transform(StrIn As String) As String
    
        With CreateObject("vbscript.regexp")
            .Global = True
            .Pattern = "^0*(\d+) (K) 0*(\d+)[-\/] ?\d{0,2}(\d\d)(?:\(\d+\))?$"
            Transform = .Replace(StrIn, "$1 $2 $3 $4")
        End With
    
    End Function
    

    All the elements from the initial array will Debug.Print "76 K 6 18".

    Hope it helps, happy coding!


    EDIT: If your goal is just to check if your string compiles against the pattern, the pattern itself can be shortened a little and you can return a boolean instead:

    'A code-block to call the function.
    Sub Test()
    
        Dim arr As Variant: arr = Array("76 K 6-18", "76 K 6-18(2)", "0076 K 0006/ 2018", "0076 K 0006/2018", "76 K 6/18")
    
        For x = LBound(arr) To UBound(arr)
            Debug.Print Transform(CStr(arr(x)))
        Next
    
    End Sub
    
    'The function that checks the input.
    Function Transform(StrIn As String) As Boolean
    
        With CreateObject("vbscript.regexp")
            .Global = True
            .Pattern = "^0*\d+ K 0*\d+[-\/] ?\d{2,4}(?:\(\d+\))?$"
            Transform = .Test(StrIn)
        End With
    
    End Function