I need your help! I’d like to use RegEx in a Excel/VBA environment. I do have an approach, but I’m kind of reaching my limits...
I need to match 5 characters within a great many lines of string (the string being in column B of my excel sheet, A comes later). The 5 characters can be 5 digits or a „K“ followed by 4 digits (ex. 12345
, 98765
, K2345
). This would be covered by (\d{5}|K\d{4})
.
Them five can be preceeded or followed by letters or special characters, but not by numbers. Meaning no leading zeros are allowed and also the digits shouldn’t just be matched within a longer number. That's one point where I'm stuck.
If there’s more than one possible match in a string, I need them all to be matched. If the same number has been matched within a line already, I’d like it not to be matched again. For these two requirements, I do have a sort of solution already, that works as part of the VBA code at the end of this posting: (\d{5}|K\d{4})(?!.*?\1.*$)
In addition, I do have a specific single digit (or a „K“) in column A. I need the five characters to start with this specific character, or otherwise not be matched.
Example of strings (numbered). The two columns A and B are separated by "|" for better readability
(1) | 1 | 2018/ID11298 00000012345 PersoNR: 889899 Bridgestone BNPN
(2) | 3 | Kompo 32280EP ###Baukasten### 3789936690 ID PFK Carbon0
(3) | 2 | 20613, 20614, Mietop Antragsnummer C300Coup IVS 33221 ABF
(4) | 2 | Q21009 China lokal produzierte Derivate f/Radverbund 991222 VV
(5) | 6 | ID:61953 F-Pace Enfantillages (Machine arriere) VvSKPMG Lyon09
(6) | 2 | 2017/22222 22222 21895 Einzelkostenprob. 28932 ZürichMP KOS
(7) | K | ID:K1245 Panamera Nitsche Radlager Derivativ Bayreumion PwC
(8) | 7 | LaunchSupport QBremsen BBG BFG BBD 70142,70119 KK 70142
The results that I'm looking for here are:
(1) | 11298 | ............................. [but don't match 12345, since no preceeding numbers allowed]
(2) | 32280 | ............................. [but don't match 37899 within 3789936690]
(3) | 20613 | 20614 | ................ [match both starting with a 2, don't match the one starting with 3]
(4) | 21009 | ............................. [preceeded by a letter, which is perfectly fine
(5) | 61953 | ..............................[random example]
(6) | 22222 | 21895 | 28932 | ... [match them all, but no duplicates]
(7) | K1245 | ............................. [special case with a "K"]
(8) | 70142 | 70119 | ................ [ignore second 70142]
The RegEx/VBA Code that I've put together so far is:
Sub RegEx()
Dim varOut() As Variant
Dim objRegEx As Object
Dim lngColumn As Long
Dim objRegA As Object
Dim varArr As Variant
Dim lngUArr As Long
Dim lngTMP As Long
On Error GoTo Fin
With Worksheets("Sheet1")
varArr = .Range("B2:B50")
Set objRegEx = CreateObject("VBScript.Regexp")
With objRegEx
.Pattern = "(\d{5}|K\d{4})(?!.*?\1.*$)" 'this is where the magic happens
.Global = True
For lngUArr = 1 To UBound(varArr)
Set objRegA = .Execute(varArr(lngUArr, 1))
If objRegA.Count >= lngColumn Then
lngColumn = objRegA.Count
End If
Set objRegA = Nothing
Next lngUArr
If lngColumn = 0 Then Exit Sub
ReDim varOut(1 To UBound(varArr), 1 To lngColumn)
For lngUArr = 1 To UBound(varArr)
Set objRegA = .Execute(varArr(lngUArr, 1))
For lngTMP = 1 To objRegA.Count
varOut(lngUArr, lngTMP) = objRegA(lngTMP - 1)
Next lngTMP
Set objRegA = Nothing
Next lngUArr
End With
.Cells(2, 3).Resize(UBound(varOut), UBound(varOut, 2)) = varOut
End With
Fin:
Set objRegA = Nothing
Set objRegEx = Nothing
If Err.Number <> 0 Then MsgBox "Error: " & Err.Number & " " & Err.Description
End Sub
This code is checking the string from column B and delivering its matches in columns C, D, E etc. It's not matching duplicates. It is however matching numbers within larger numbers, which is a problem. \b
for example doesn't work for me, because I still want to match 12345
in EP12345
.
Also, I have no idea how to implement the character from column A to be the very first character.
I've uploaded my excel file here: mollmell.de/RegEx.xlsm
Thank you so much for suggestions
Stephan
To sort out the numbers which are too long, you can use a negative lookbehind and lookahead that doesn't match preceding and successing digits:
(?x) (?<!\d) (\d{5} | K\d{4}) (?!\d)
https://regex101.com/r/RBnoMo/1
To match only numbers with the key in column 2 is rather hard. Maybe you match either the key or the numbers and do the logic afterwards:
(?x)
\|[ ](?<key>.)[ ]\| |
(?<!\d) (?<number>\d{5} | K\d{4}) (?!\d)