Search code examples
regexautohotkeytexttrimming

Autohotekey: How to extract text between two words with multiple occurrences in a large text document


Using Autohotkey, I would like to copy a large text file to the clipboard, extract text between two repeated words, delete everything else, and paste the parsed text. I am trying to do this to a large text file with 80,000+ lines of text where the start and stop words repeat 100s of times.

Any help would be greatly appreciated!

Input Text Example

Delete this text De l e te this text

StartWord

Apples Oranges
Pears Grapes

StopWord

Delete this text Delete this text

StartWord

Peas Carrots
Peas Carrots

StopWord

Delete this text Delete this text

Desired Output Text

Apples Oranges
Pears Grapes

Peas Carrots
Peas Carrots

I think I found a regex statement to extract text between two words, but don't know how to make it work for multiple instances of the start and stop words. Honestly, I can't even get this to work.

!c::
Send, ^c
Fullstring = %clipboard%
RegExMatch(Fullstring, "StartWord *\K.*?(?= *StopWord)", TrimmedResult)
Clipboard := %TrimmedResult%
Send, ^v
return

Solution

  • You can start the match at StartWord, and then match all lines that do not start with either StartWord or StopWord

    ^StartWord\s*\K(?:\R(?!StartWord|StopWord).*)+
    
    • ^ Start of string
    • StartWord\s*\K Match StartWord, optional whitespace chars and then clear forget what is matched so far using \K
    • (?: Non capture group to repeat as a whole
      • \R Match a newline
      • (?!StartWord|StopWord).* Negative lookahead, assert that the line does not start with Start or Stopword
    • )+ Close the non capture group and repeat 1 or more times to match at least a single line

    See a regex demo.