Search code examples
rregexgsubbracketscurly-braces

Recursive regex in R for curly braces


I have some text string in the following pattern.

x = "sdfwervd \calculus{fff}{\trt{sdfsdf} & \trt{sdfsdf} & \trt{sdfsdf} \\{} sdfsdf & sdfsdf & sefgse3 } aserdd wersdf sewtgdf"
  1. I want to use regex to capture the text "fff" in the string \calculus{fff} and replace it with something else.

  2. Further I want to capture the string between the first { after \calculus{.+} and it's corresponding closing curly brace }.

How to do this with regex in R ?

The following captures everything till last curly brace.

gsub("(\\calculus\\{)(.+)(\\})", "", x)

Solution

  • For the second task you can use a recursive approach in combination with regmatches() and gregexpr() in base R:

    x <- c("sdfwervd \\calculus{fff}{\\trt{sdfsdf} & \\trt{sdfsdf} & \\trt{sdfsdf} \\{} sdfsdf & sdfsdf & sefgse3 } aserdd wersdf sewtgdf")
    
    pattern <- "\\{(?:[^{}]*|(?R))*\\}"
    (result <- regmatches(x, gregexpr(pattern, x, perl = TRUE)))
    


    This yields a list of the found submatches:

    [[1]]
    [1] "{fff}"                                                                          
    [2] "{\\trt{sdfsdf} & \\trt{sdfsdf} & \\trt{sdfsdf} \\{} sdfsdf & sdfsdf & sefgse3 }"
    

    See a demo for the expression on regex101.com.