I had originally asked the question below and found a good formula excel formula find part number in file path text string
So after using the =MID(A1,SEARCH("????-??????-??",A1),14)
formula to find the part number names things seemed fine until I ran into data that looks like the sample set below.
c:\users\stuff\folder_name\folder_name_9876-vv-123-Ag\1234-000001-01_ baskets_1.pdf
c:\users\stuff\folder_name_9876-vv-123-Ag\random file_1234-000001-02_ baskets_2.pdf
c:\users\stuff\folder_name_XXXX-XXXXXX-XX\1234-000001-03_ baskets_3.pdf
c:\users\stuff\folder_name\1234-000030-01_ tree_30.pdf
c:\users\stuff\folder_name\random text_1234-000030-02_ tree_30.pdf
c:\users\stuff\folder_name\more random stuff_1234-000030-02_ tree_30.pdf
I am looking thru a few hundred thousand file names and I know I am missing some important files due to the false positive triggers. Like 9876-vv-123-Ag
it technically fits the ????-??????-??
format but is not what I want to extract from the string. Now to the question, is there a way to apply =MID(A1,SEARCH("????-??????-??",A1),14)
that can look from the right side of the string and work its way left to ensure that it grabs the last sub-string in the full string that meets the ????-??????-??
format?
or is there a way to set the wildcard search to numeric only instead of anything in that ????-??????-??
format?
If there are no other ambiguities, you can search just the file name portion of the file path, and return the matching string:
=MID(TRIM(RIGHT(SUBSTITUTE(A1,"\",REPT(" ",99)),99)),SEARCH("????-??????-??",TRIM(RIGHT(SUBSTITUTE(A1,"\",REPT(" ",99)),99))),14)
If there are other ambiguities, you will need to be more specific
This portion of the formula: TRIM(RIGHT(SUBSTITUTE(A1,"\",REPT(" ",99)),99))
returns the last substring where the string is split on the "\" character. If it might be the case that the file name might have more than 99 characters, then increase the 99 appropriately, or use something like LEN(A1)
instead.
If you want to use a VBA routine, the following routine processed 500,000 rows in less than 4 seconds. It does depend on the part number consisting of only digits, but the change would be trivial if other patterns need to be included.
Option Explicit
Sub partNum()
Dim RE As Object
Dim R As Range, WS As Worksheet
Dim vSrc As Variant, vRes As Variant
Dim I As Long
Set WS = Worksheets("sheet1")
With WS
vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
Set R = .Range(.Cells(1, 2), .Cells(UBound(vSrc, 1), 2))
End With
ReDim vRes(1 To UBound(vSrc, 1), 1 To 1)
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "\d{4}-\d{6}-\d{2}"
For I = 1 To UBound(vSrc)
If .test(vSrc(I, 1)) = True Then vRes(I, 1) = .Execute(vSrc(I, 1))(0)
Next I
End With
R.EntireColumn.Clear
R = vRes
End Sub
VBA Function (stand-alone)
Option Explicit
Function getPartNum(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "\d{4}-\d{6}-\d{2}(?!.*\\)"
If .test(S) = True Then getPartNum = .Execute(S)(0)
End With
End Function