Search code examples
google-sheetsre2

Regex in Google Sheets to extract outline values


Want to extract outline values in a google sheet, using a formula

Tried Regex that will work with Arrayformula(IFERROR(REGEXEXTRACT(RANGE,"REGEX")))

almost there but I'm thinking someone has the solution or an easier way to do this?

1(a) Background
1(a)(i) Historical context
1(a)(i)(A) Early influences
1(a)(i)(A)(1) Roman civilization
IIIV.
AA. Background
1.1 Historical context
34 Historical context
a. Early influences
i. Roman civilization
- Background
- Historical context
We are
I. Roman civilization
II. Romans
IV. 
IV. I am sad
iii

Expected output

1(a)
1(a)(i)
1(a)(i)(A)
1(a)(i)(A)(1)
IIIV.
AA.
1.1
34
a.
i.
-
-

I.
II.
IV.
IV.
iii

This seems to work but everything gets split over multiple columns, Removing some of the parenthesis from capture groups to prevent split resulted in not extracting "-"

Arrayformula(IFERROR(REGEXEXTRACT(RANGE,"REGEX")))

Regex:

^(([A-Za-z]|\d+)((\([A-Za-z\d]+\))|(\.\d+))*\.?|\-)\s+([^\n]+)

Possible solutions?

  1. concatenating the output Arrayformula([working CONCATENATE formula for variable columns](IFERROR(REGEXEXTRACT(RANGE,"REGEX")))

  2. I think the capture groups are needed for the regex to work, if thats true and if the output with google sheets by default uses a split() any ()into a new column, then perhaps using a substitute to replace any "(" or ")" with ~ prior to the regexextract and then after extracting these values

Then I was thinking to just use something like Regexreplace(Range," .+","") but this would extract out the first word of a sentence.


Solution

  • Can you test:

    =arrayformula(ifna(regexextract(A:A&"","^.*?[^a-zA-Z](?: |$)"),ifna(regexextract(A:A&"","^[a-z]*(?: |$)"))))
    

    enter image description here