Search code examples
regexmatlabequation

Regular expression for equations, variable number of inside parenthesis


I'm trying to write Regex for the case where I have series of equations, for example:

a = 2 / (1 + exp(-2*n)) - 1
a = 2 / (1 + e) - 1
a = 2 / (3*(1 + exp(-2*n))) - 1

In any case I need to capture content of the outer parenthesis, so 1 + exp(-2*n), 1+e and 3*(1 + exp(-2*n)) respectively.

I can write expression that will catch one of them, like:

\(([\w\W]*?\))\) will perfectly catch 1 + exp(-2*n)

\(([\w\W]*?)\) will catch 1+e

\(([\w\W]*?\))\)\) will catch 3*(1 + exp(-2*n))

But it seems silly to pass three lines of code for something such simple. How can I bundle it? Please take a note that I will be processing text (in loop) line-by-line anyway, so you don't have to bother for securing operator to not greedy take next line.

Edit: Un-nested brackets are also allowed: a = 2 / (1 + exp(-2*n)) - (2-5)


Solution

  • The commented code below does not use regular expressions, but does parse char arrays in MATLAB and output the terms which contain top-level brackets.

    So in your 3 question examples with a single set of nested brackets, it returns the outermost bracketed term.

    In the example from your comment where there are two or more (possibly nested) terms within brackets at the "top level", it returns both terms.

    The logic is as follows, see the comments for more details

    • Find the left (opening) and right (closing) brackets
    • Generate the "nest level" according to how many un-closed brackets there are at each point in the equation char
    • Find the indicies where the nesting level changes. We're interested in opening brackets where the nest level increases to 1 and closing brackets where it decreases from 1.
    • Extract the terms from these indices
    e = { 'a = 2 / (1 + exp(-2*n)) - 1'
          'a = 2 / (1 + e) - 1'
          'a = 2 / (3*(1 + exp(-2*n))) - 1'
          'a = 2 / (1 + exp(-2*n)) - (2-5)' };
      
    str = cell(size(e)); % preallocate output
    for ii = 1:numel(e)
        str{ii} = parseBrackets_(e{ii});
    end
    
    
    function str = parseBrackets_( equation )
        bracketL = ( equation == '(' ); % indicies of opening brackets
        bracketR = ( equation == ')' ); % indicies of closing brackets
        str = {}; % intialise empty output
        if numel(bracketL) ~= numel(bracketR)
            % Validate the input
            warning( 'Could not match bracket pairs, count mismatch!' )
            return
        end
        
        nL = cumsum( bracketL ); % cumulative open bracket count
        nR = cumsum( bracketR ); % cumulative close bracket count
        nestLevel = nL - nR;     % nest level is number of open brackets not closed
        nestLevelChanged = diff(nestLevel); % Get the change points in nest level
        % get the points where the nest level changed to/from 1
        level1L = find( nestLevel == 1 & [true,nestLevelChanged==1] ) + 1; 
        level1R = find( nestLevel == 1 & [nestLevelChanged==-1,true] ); 
        
        % Compile cell array of terms within nest level 1 brackets
        str = arrayfun( @(x) equation(level1L(x):level1R(x)), 1:numel(level1L), 'uni', 0 );
    end
    

    Outputs:

    str = 
        {'1 + exp(-2*n)'}
        {'1 + e'}
        {'3*(1 + exp(-2*n))'}
        {'1 + exp(-2*n)'}    {'2-5'}