Search code examples
regexspreadsheetregex-group

regex to match first n lines after a specific word in Google Spreadsheet


In Google Spreadsheet Script, I want to capture the first 2 lines , after i=Transactional> And the new line should not start with the word STYLE

Here's the full text:

<http://link.ORDERACK...&i=Transactional>
STYLE: 0043
QTY: 11 

<http://link.ORDERACK...&i=Transactional>
Striped
Cotton Rugby Short RED
<http://link.ORDERACK...&i=Transactional>
STYLE: 0042
QTY: 10 

<http://link.ORDERACK...&i=Transactional>
Striped
Cotton Polo Short

The regex should ignore the lines that start with STYLE

I used this regex:

var regExp=new RegExp("((i=Transactional\>\r*\n)(?!STYLE)(.*\r*\n){2})","gm");
m = regExp.exec(text)

However, the output includes the i=Transactional> as well:

i=Transactional> Striped Cotton Rugby Short RED
i=Transactional> Striped Cotton Polo Short

Tried to use (?<=i=Transactional\> to get everything after i=Transactional> but it doesn't work in google spreadsheet script (or in my code)

The output I'm expecting to see:

Striped Cotton Rugby Short RED
Striped Cotton Polo Short

Solution

  • Since Javascript doesn't universally support lookbehinds you will have to settle for capture groups:

    (i=Transactional>\r?\n)(?!STYLE)(.*\r?\n.*)
    

    https://regex101.com/r/ovWA4L/1

    Your desired data will be in capture group #2