The following code snippet loops through single-column tables containing keywords, in a keywords sheet and tries matching these concatenated keywords of each column, to a text string (Str) that has been concatenated from an array containing verbatims.
I need a regex pattern that can extract not only the keyword as a submatch, but the entire line/paragraph in which the keyword match was found.
'========================= Grab keyword tables for matching ======================================
Set dRegex = CreateObject("Scripting.Dictionary")
Set dDupl = CreateObject("Scripting.Dictionary")
' Keywords Sheet
With wkKeywords
' find last col on header row 1
x = .Cells(1, .Columns.Count).End(xlToLeft).Column
' scan across header
For c = 1 To x
colname = Trim(.Cells(1, c))
If Len(colname) > 0 Then
' last row
n = .Cells(.Rows.Count, c).End(xlUp).Row
ReDim ar(0 To n - 2)
'modify for regex
For i = 2 To n
s = Trim(.Cells(i, c))
If Len(s) = 0 Then
MsgBox "Nothing in keyword row " & i & " column " & c, vbCritical
Exit Sub
End If
ar(i - 2) = Replace(s, "'", "'?") ' optional
ar(i - 2) = Replace(ar(i - 2), """", """?")
Next i
Set dRegex(colname) = CreateObject("VBScript.RegExp")
With dRegex(colname)
.Global = True
.MultiLine = False
.ignorecase = True
' .Pattern = "(\b" & Join(ar, "\b)|(\b") & "\b)"
' .Pattern = "((.*?\b" & Join(ar, "\b|\b") & ".*?)(?=\b|$))"
' .Pattern = "(\b(" & Join(ar, "|") & ")\b)(.*?(\r\n|\r|\n|$))"
' .Pattern = "((.*?)(\b" & Join(ar, "\b|\b") & "\b)(.*?))" '(\r\n|\r|\n|$))"
.Pattern = "((?:.*(?:\r\n|\r|\n|$))?(\b" & Join(ar, "\b|\b") & "\b)(?:.*(?:\r\n|\r|\n|$))?)"
'Debug.Print colname, n, .Pattern
End With
End If
Next c
End With
'========================= Transpose data Array ======================================
tempArr = ArrOut
Erase ArrOut
ArrOut = TransposeArray(tempArr)
'========================= Join textual data as a concatenated string for matching ======================================
For i = LBound(ArrOut, 1) To UBound(ArrOut, 1)
str = str & ArrOut(i, 1) & ":" & ArrOut(i, 2) & vbCr
Next i
Stop
'========================= Find Matches in textual data ======================================
For Each colname In dRegex.keys
Set matches = dRegex(colname).Execute(str)
' DoEvents
Stop
For i = 0 To (matches.Count - 1)
For j = 0 To matches(i).submatches.Count - 1
If Not matches(i).submatches(j) = Empty Then
Debug.Print colname, Trim(matches(i).submatches(j))
End If
Next j
Stop
Next i
Stop
Next colname
Stop
I have tried a couple of regex patterns and some of them return just the keyword, or the keyword and a part of the paragraph after the keyword or entire paragraphs before and after the keyword.
I just need the entire line or paragraph returned in which the keyword was matched. So a keyword say "eager" could be found in multiple different lines/paragraphs. I want to list the keyword and its matching paragraphs.
The lines/paragraphs in the string could like :
tody:Never really considered what it could do. But now the question is across as far as lack of sleep could affect how your body generates. I am eager to know much more. anabelle:Understand it's something to do with the body clock. That's all I know really, you can feel differently in different times of day, stuff like that. tody:I'd like to know if this lotion is long-lasting, and how much it would be and how many times you would have to apply it? I know it is said that it would take up the six or eight weeks, but how many times would you need to apply it? simran:I think this lotion could be different, because it works with your rhythm, and when you go to sleep, so it could have a more lasting effect, and since you put it on your face at the end of the night instead of the morning when you wake up.
Some keywords in one of the keywords columns to be matched in the paragraphs, could be:
affect
eager
feel differently
long-lasting
different
lasting effect
Please note, i don't want to loop the textual data array ArrOut
by splitting it into individual lines, but want to do it in one go without much looping.
I am not a pro in regex expressions. Please let me know if you need anything else.
Following from my comment about extracting the matched paragraph if you know the match position within the text:
Sub GetMatchedParagraph()
Dim txt As String, pos As Long, posB As Long, posE As Long
txt = "This is the first paragraph." & vbLf & _
"This is the second paragraph." & vbLf & _
"This is the third paragraph." & vbLf & _
"This is the fourth paragraph." & vbLf & _
"This is the fifth paragraph."
pos = InStr(1, txt, "second", vbTextCompare) 'find "second"
If pos = 0 Then Exit Sub
'previous newline, or start of text if none
posB = InStrRev(txt, vbLf, pos)
If posB = 0 Then posB = 1
'next newline, or end of text if none
posE = InStr(pos, txt, vbLf)
If posE = 0 Then posB = Len(txt)
Debug.Print Mid(txt, posB, (posE - posB)) 'the matched paragraph
End Sub