Search code examples
phpregexiptables

Parsing parameter from string | regex | php


I am having problems parsing parameter from a string.

Parameter are defined by the following:

  • can be written in short or long notation, p.ex: -a / --long

  • characters range from [a-z0-9] for short and [a-z0-9\-] for long notation, p.ex: --long-with-dash

  • can have a value, but don't have to, p.ex: -a test / --aaaa

  • can have multiple arguments, without being in quotes, p.ex: -a val1 val2 (that should be captures as one group: value = "val1 val2")

  • can have custom text inside quotes --custom "here can stand everything, --test test :( "

  • parameter can have a "!" infront ! --test test / ! -a

  • values can have "-" inside -a value-with-dash

All these Parameters come in one long string, p.ex:

-a val1 ! -b val2 --other "string with crazy -a --test stuff inside" --param-with-dash val1 val2 -test value-with-dash ! -c -d ! --test

-- EDIT ----

also --param value-with-dash

-- END EDIT ---

This is as close as i can get:

https://regex101.com/r/3aPHzp/1

/(?:(?P<inverted>\!) )?(?P<names>\-{1,2}\S+)($| (?P<values>.+(?=(?: [\!|\-])|$)))/U

unfortunatly it breaks when it comes to the free text value inside quotes. And when a parameter without value is followed by the next parameter.

(i try to parse the output of iptables-save, in case you are interessted. Also, maybe i split can split the string in an other fancy way before, to avoid a hugh regex, but i don't see it).

Thank you very much for your help!

-- FINAL SOLUTION --

for PHP >= 5.6

(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S*|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)\K

Demo: https://regex101.com/r/xSfgxP/1

for PHP < 5.6

(?<inverted>\!)?\s*(?<=(?:\s)|^)(?<name>\-{1,2}\w[\w\-]*)\s+(?<value>(?:\s*(?:\w\S*|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)


Solution

  • RegEx:

    (?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)\K
    

    Live demo (updated)

    Breakdown

     (?<inverted> ! )?             # (1) Named-capturing group for inverted result
     \s*                           # Match any spaces
     (?<name> --? \w [\w-]* )      # (2) Named-capturing group for parameter name
     \s*                           # Match any spaces
     (?<values>                    # (3 start) Named capturing group for values
          (?:                           # Beginning of a non-capturing group (a)
               \s*                      # Match any spaces
               (?:                      # Beginning of a non-capturing group (b)
                    \w\S+                   # Match a [a-zA-Z0-9_] character then any non-whitespace characters
                 |                          # Or
                    ["']                    # Match a qoutation mark
                    (?:                     # Beginning of a non-capturing group (c)
                         [^"'\\]*               # Match anything except `"`, `'` or `\`
                         (?: \\ . [^"'\\]* )*   # Match an escaped character then anyhthing except `"`, `'` or `\` as much as possible
                    )                       # End of non-capturing group (c)
                    ['"]                    # Match qutation pair
               )                        # End of non-capturing group (b)
          )*                            # Greedy (a), end of non-capturing group (a)
     )                             # (3 end)
     \K                            # Reset allocated memory of all previously matched characters
    

    PHP code:

    <?php 
        
    $str = '-a val1 ! -b val2 --custom "string :)(#with crazy -a --test stuff inside" --param-with-dash val1 val2 -c ! -d ! --test';
    $re = <<< 'RE'
    ~(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)\K~
    RE;
    
    preg_match_all($re, $str, $matches, PREG_SET_ORDER);
    print_r(array_map('array_filter', $matches));
    

    Output:

    Array
    (
        [0] => Array
            (
                [name] => -a
                [2] => -a
                [values] => val1
                [3] => val1
            )
    
        [1] => Array
            (
                [inverted] => !
                [1] => !
                [name] => -b
                [2] => -b
                [values] => val2
                [3] => val2
            )
    
        [2] => Array
            (
                [name] => --custom
                [2] => --custom
                [values] => "string :)(#with crazy -a --test stuff inside"
                [3] => "string :)(#with crazy -a --test stuff inside"
            )
    
        [3] => Array
            (
                [name] => --param-with-dash
                [2] => --param-with-dash
                [values] => val1 val2
                [3] => val1 val2
            )
    
        [4] => Array
            (
                [name] => -c
                [2] => -c
            )
    
        [5] => Array
            (
                [inverted] => !
                [1] => !
                [name] => -d
                [2] => -d
            )
    
        [6] => Array
            (
                [inverted] => !
                [1] => !
                [name] => --test
                [2] => --test
            )
    
    )