Search code examples
regexsmt-lib

Regex to interpret smtlib2 format


I am trying to figure out a regex that could match results of a program that outputs in the smtlib format. Basically, my data is in the form:

 (define-fun X_1 () Int
    281)
 (define-fun X_71 () Int
    104)
 (define-fun X_90 () Int
    21)
 (define-fun X_54 () Int
    250)
etc.

Is it possible to write an expression that matches:

X_1 (...) 281
X_71 (...) 104
X_90 (...) 21
etc.

My current approach is to find individual occurences using \(define-fun[\w\s]+\), then for each occurence, remove (define-fun, Int, () and ), and then read the data as all that's left is something like 1 281, 71 104

I'm just wondering if there is a more elegant way of doing this?


Solution

  • Capture these substrings:

    \(define-fun\s+(\S+).*\n\s*(\d+)\)
    

    See regex proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      \(                       '('
    --------------------------------------------------------------------------------
      define-fun               'define-fun'
    --------------------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        \S+                      non-whitespace (all but \n, \r, \t, \f,
                                 and " ") (1 or more times (matching the
                                 most amount possible))
    --------------------------------------------------------------------------------
      )                        end of \1
    --------------------------------------------------------------------------------
      .*                       any character except \n (0 or more times
                               (matching the most amount possible))
    --------------------------------------------------------------------------------
      \n                       '\n' (newline)
    --------------------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
                               possible))
    --------------------------------------------------------------------------------
      (                        group and capture to \2:
    --------------------------------------------------------------------------------
        \d+                      digits (0-9) (1 or more times (matching
                                 the most amount possible))
    --------------------------------------------------------------------------------
      )                        end of \2
    --------------------------------------------------------------------------------
      \)                       ')'