Search code examples
rstringrecursion

Recursive functions in R to apply regex within nested parentheses


I'm working with nested expressions in R and need some help with writing a recursive function to process these expressions starting with the innermost parentheses first.

Given a string like:

expr <- "(((a OR b) OR c) AND d) OR (e AND f)"

I want to perform the following replacements:

  1. Replace all occurrences of "(i OR j)" with "min(1, i + j)"
  2. Replace all occurrences of "(i AND j)" with "(i * j)"

For example:

  • "(a OR b)" should become "min(1, a + b)"
  • "(e AND f)" should become "(e * f)"

The string "((a OR b) OR c) AND d) OR (e AND f)" should evaluate to:

"min(1, (min(1, min(1, a + b) + c) * d) + (e * f))"

The challenge is that the expressions can be nested, and I need to handle the innermost parentheses first before moving outward. This is because I am writing functions to take user input for objective functions, and I have R code to dynamically write Rcpp code...

I've considered using regular expressions to find and replace the patterns, but handling the nested structures recursively is where I'm stuck. I've tried a recursive function approach the following:


process_expr <- function(string) {
  
  # Match the innermost parentheses
  pattern <- "\\(([^()]*)\\)"
  brackets <- gregexpr(pattern, string, perl = TRUE)
  matches <- regmatches(string, brackets)
  
  # If there are no more matches, return the string
  if (length(matches[[1]]) == 0) {
    return(string)
  }
  
  # Process each match
  for (match in matches[[1]]) {
    # Apply the replacements
    modified <- gsub("\\(([^()]+) OR ([^()]+)\\)", "min(1, \\1 + \\2)", match)
    modified <- gsub("\\(([^()]+) AND ([^()]+)\\)", "(\\1 * \\2)", modified)
    
    # Replace the match in the original string
    string <- sub(pattern, modified, string, fixed = TRUE)
  }
  
  # Recurse to process the next level of nested parentheses
  return(process_expr(string))
}

# Example usage
expr <- "((a OR b) OR c) AND d) OR (e AND f)"
result <- process_expr(expr)
print(result)

The code above doesn't seem to correctly handle the nested parentheses, especially when there are multiple levels of nesting. I need the function to:

  1. Identify the innermost expressions first.
  2. Apply the replacements correctly.
  3. Handle multiple levels of nested expressions.

All the solutions I've looked at work for Python or Perl, but not for R.

Is there another approach that I can use to get what I need? i.e.,

  1. Replace all occurrences of "(i OR j)" with "min(1, i + j)"
  2. Replace all occurrences of "(i AND j)" with "(i * j)"

Example data: "(((a OR b) OR c) AND d) OR (e AND f)"

Needs to evaluate to:

min(1, (min(1, min(1, a + b) + c) * d) + (e * f))

EDIT:

Here are a couple of other I/O examples:

input_2 <- "(r1 && r2) || r3 + r4 + (r5 && r6)"

output_2: "min(1, (r1 * r2) + r3) + r4) + (r6 * r7)"

Here are a couple of other I/O examples:

input_3 <- "(r1 && r2) || (r3 && r4) + r5 || (r6 + r7)"

output_3: "min(1, (r1 * r2) + (r3 * r4)) + min(1, r5 + (r6 + r7))"


Solution

  • If we translate the input to valid R code, we can use the parse and to the substitution. For example this helper function does the trick

    transform <- function(expr) {
      tx <- function(e) {
        if (is.call(e)) {
          parts <- as.list(e)
          if (parts[[1]] == quote(`%OR%`)) {
            parts[[1]] <- quote(`min`)
            parts[[3]] <- bquote(.(tx(parts[[2]])) + .(tx(parts[[3]])))
            parts[[2]] <- 1
            as.call(parts)         
          } else if (parts[[1]]==quote(`%AND%`)) {
            parts[[1]] <- quote(`*`)
            parts[[2]] <- tx(parts[[2]])
            parts[[3]] <- tx(parts[[3]])
            as.call(parts)
          } else {
            parts[-1] <- lapply(parts[-1], tx)
            as.call(parts)
          }
        } else {
          e
        }
      }
      fix <- gsub("\\bOR\\b", "%OR%", gsub("\\bAND\\b", "%AND%", expr))
      fix <- gsub("\\|\\|", "%OR%", gsub("\\&\\&", "%AND%", fix))
      deparse1(tx(str2lang(fix)))
    }
    

    Which we can test with

    expr <- "(((a OR b) OR c) AND d) OR (e AND f)"
    transform(expr)
    # [1] "min(1, ((min(1, (min(1, a + b)) + c)) * d) + (e * f))"
    input_2 <- "(r1 && r2) || r3 + r4 + (r5 && r6)"
    transform(input_2)
    # [1] "min(1, (r1 * r2) + r3) + r4 + (r5 * r6)"
    input_3 <- "(r1 && r2) || (r3 && r4) + r5 || (r6 + r7)"
    transform(input_3)
    # [1] "min(1, (r1 * r2) + (r3 * r4)) + min(1, r5 + (r6 + r7))"
    

    We turn OR and AND into %OR% and %AND% which can be parsed by R and then walk the abstract syntax tree and do the transformation. The parse does the hard work of building the tree.