Search code examples
revallexical-scopenon-standard-evaluation

R - Checking if a string is a valid mathematical expression using non-standard evaluation


I would like to check if the strings below are valid mathematical expressions:

s1 = 'sin(x)'
s2 = 'sin(x*m)'
s3 = 'sin'
s4 = 'sin(xm)'

By 'valid', I mean the expression is a combination of

  1. operators (must be used in conjunction with variables or constants)
  2. variables x and/or m
  3. constants.

By this definition s1 and s2 are valid while s3 and s4 are not.

To identify if a string is valid, I wrote a function checkFxn that first attempts to convert the string into a call or one of its parts. If successful, it then recurses through the call-tree and checks for the above conditions. If the conditions are satisfied, then the call is returned as-is. If not, an error is thrown.

checkFxn <- function(x) {

  lang <- str2lang(x)

  checkFxn2 <- function(y) {

    if(is.name(y)) {

      stopifnot(deparse(y) %in% c('x', 'm'))

    } else if(is.call(y)) {

      stopifnot(is.function(eval(y[[1]])) | is.primitive(eval(y[[1]])))

      lapply(y[-1], checkFxn2)

    } else {

      stopifnot(is.logical(y) | is.numeric(y) | is.complex(y))

    }

    return(y)

  }

  checkFxn2(lang)

}


#Applying checkFxn to s1-4
lapply(list(s1,s2,s3,s4), function(x) {try(checkFxn(x), silent = T)})
[[1]]
sin(x)

[[2]]
sin(x * m)

[[3]]
[1] "Error in checkFxn2(lang) : deparse(y) %in% c(\"x\", \"m\") is not TRUE\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in checkFxn2(lang): deparse(y) %in% c("x", "m") is not TRUE>

[[4]]
[1] "Error in FUN(X[[i]], ...) : deparse(y) %in% c(\"x\", \"m\") is not TRUE\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in FUN(X[[i]], ...): deparse(y) %in% c("x", "m") is not TRUE>

It seems to work as expected but I'm wary of my use of eval and was wondering if someone could suggest an alternative to using it? I know that it follows the usual lexical scoping rules, so I'm worried about it evaluating variables in the gobal environment - is there a way to restrict its scope? I've read the chapter on non-standard evaluation but I can't figure it out.

Also, is there a way to identify if a base function or primitive is a mathematical operator? I would like to use something more specific than is.function and is.primitive.


Solution

  • Step 1: Decide what constitutes a "mathematical operator". One option is to retrieve relevant groups from the S4 generics. For example,

    mathOps <- unlist(lapply( c("Arith","Compare","Math"), getGroupMembers ))
    #  [1] "+"        "-"        "*"        "^"        "%%"       "%/%"     
    #  [7] "/"        "=="       ">"        "<"        "!="       "<="      
    # [13] ">="       "abs"      "sign"     "sqrt"     "ceiling"  "floor"   
    # [19] "trunc"    "cummax"   "cummin"   "cumprod"  "cumsum"   "exp"     
    # [25] "expm1"    "log"      "log10"    "log2"     "log1p"    "cos"     
    # [31] "cosh"     "sin"      "sinh"     "tan"      "tanh"     "acos"    
    # [37] "acosh"    "asin"     "asinh"    "atan"     "atanh"    "cospi"   
    # [43] "sinpi"    "tanpi"    "gamma"    "lgamma"   "digamma"  "trigamma"
    

    Step 2: Decompose your expressions into abstract syntax trees.

    getAST <- function( ee ) 
        lapply( as.list(ee), function(x) `if`(is.call(x), getAST(x), x) )
    
    # Example usage
    getAST( quote(sin(x+5)) )
    # [[1]]
    # sin
    # 
    # [[2]]
    # [[2]][[1]]
    # `+`
    # 
    # [[2]][[2]]
    # x
    # 
    # [[2]][[3]]
    # [1] 5
    

    Step 3: Traverse the ASTs based on your definition of "validity"

    checkFxn <- function( ast, validOps )
    {
      ## Terminal nodes of an AST will not be lists
      ## Wrap them into a list of length 1 to keep the recursion flow
      if( !is.list(ast) ) ast <- list(ast)
    
      ## Operators must be called with one or more arguments
      if( as.character(ast[[1]]) %in% validOps )
        return( `if`(length(ast) < 2, FALSE,
                     all(sapply(ast[-1], checkFxn, validOps))) )
    
      ## Variables x and m are OK
      if( identical(ast[[1]], quote(x)) || identical(ast[[1]], quote(m)) )
        return(TRUE)
    
      ## Constants are OK
      if( is.numeric(ast[[1]]) ) return(TRUE)
    
      ## Everything else is invalid
      FALSE
    }
    

    Putting it all together

    exprs <- lapply( list(s1,s2,s3,s4), str2lang )   # Convert strings to expressions
    asts <- lapply( exprs, getAST )                  # Build ASTs
    sapply( asts, checkFxn, mathOps )                # Evaluate validity
    # [1]  TRUE  TRUE FALSE FALSE 
    

    Alternative to ASTs

    As pointed out by @Moody_Mudskipper, one can also use all.names to retrieve the list of symbols occurring inside an arbitrary expression. While this doesn't preserve the relative structure of those symbols, the names can be compared directly against mathOps.