Search code examples
regexpcreregex-lookaroundsregex-groupregex-greedy

How to parse variable length command line arguments with regex?


I have a large number of files that each contain a bash command with a variable amount of parameters. I need to replace these with a corresponding API call.

Example bash command in file (Note: number of '-p' arguments vary, some have none):

./some_script.sh http://some.server.com -p a=value -p b=value -p c=value

Example corresponding API call

http://some.server.com/api/some/endpoint?a=value&b=value&c=value

My issue is I can't seem to group each parameter, given that the number of parameters is variable.

Basic regex (this will match above example, but only group first parameter):

.\/some_script.sh\s([\w\/:\.]*)(\s-\w\s[\w=]*)

And I tried:

.\/some_script.sh\s([\w\/:\.]*)(\s-\w\s[\w=]*)*

However, this seems to only group the last parameter. (tested with regex101)

Ideally, I would like this regex to be able to group an indefinite number of arguments in these files so that I can easily rebuild the command as an API call.

If more detail is required please let me know, any suggestions are welcome.


Solution

  • Here, maybe we could find another approach, and step by step collect our desired data from our inputs. We would then likely start with an expression similar to:

    .+\.sh.+?(https?:\/\/[^\s]*)|\s+-[a-z]+\s+([\w=]+)
    

    which has our link in here:

    (https?:\/\/[^\s]*)
    

    and our variables in:

    ([\w=]+)
    

    altered with a logical OR.

    We can also modify and add other boundaries or reduce our boundaries, if that might be desired.

    DEMO

    Test

    This snippet just shows that how the capturing groups work:

    const regex = /.+\.sh.+?(https?:\/\/[^\s]*)|\s+-[a-z]+\s+([\w=]+)/gm;
    const str = `./some_script.sh http://some.server.com -p a=value -p b=value -p c=value
    `;
    let m;
    
    while ((m = regex.exec(str)) !== null) {
        // This is necessary to avoid infinite loops with zero-width matches
        if (m.index === regex.lastIndex) {
            regex.lastIndex++;
        }
        
        // The result can be accessed through the `m`-variable.
        m.forEach((match, groupIndex) => {
            console.log(`Found match, group ${groupIndex}: ${match}`);
        });
    }

    RegEx Circuit

    jex.im visualizes regular expressions:

    enter image description here