Search code examples
javascriptregexstringparsingcapturing-group

How to parse and distinguish different and varying arguments of a user command with a regular expression?


I'm trying to interpret user commands as dash optional flag.

{run -o -f -a file1 file2}

Something like this:

/{run (-o|-f) (\w+) (.+?)}/g;

Is very limiting with only 1 choice of flag.

I'm looking for a regex that can properly parse the string with any amount of dash flags, spit out the flags into groups, and not worry about a set amount of whitespace in between.

string = "{run    -a  file1 file2}"
string = "{run    -a -o -f  file1 file2}"
string = "{run -f-a-o  file1 file2}"

string.match(regex) should output each flag and each file name.

Example output would be:

["f", "a", "o", "file1", "file2"]

Or if not possible, something like this:?

["-f-a-o", "file1", "file2"]

Solution

  • A single regex is capable of capturing a maximum amount of 9 groups.

    Thus ... "parse[ing a] string with any amount of dash flags" ... like the OP does demand can not be achieved by a single regex alone.

    A good enough approach was to capture both groups, the flags sequence and the files sequence and then to process them into a concatenated list of separated flag and file name items ...

    // see ... [https://regex101.com/r/VFjeK1/1]
    const regXFlagsAndFiles =
      (/^\{\s*run(?:\s+-(?<flags>[a-z]+(?:\s*-[a-z]+)*))*\s+(?<files>[\w.]+(?:\s+[\w.]+)*)\s*\}$/);
      
    function parseFlagAndFileList(value) {
      const {
        flags,
        files,
      } = regXFlagsAndFiles
        .exec(String(value))
        ?.groups || {};
    
      return (flags
        ?.split(/\s*-\s*/)
        ?? []
      ).concat(
        files
          ?.split(/\s+/)
          ?? []
      );
    }
    
    console.log([
    
      '{run file1 file2}',
      '{run    -a  file1 file2}',
      '{run    -a -o -f  file1 file2}',
      '{run -f-a-o  file1 file2}',
    
    ].map(parseFlagAndFileList));
    
    console.log([
    
      '{run}',
      '{run      }',
      '{run file1  }',
      '{run    -abc  file1.foo file2.bar  }',
      '{run    -ab -ogg -fgg  file1.baz file2  }',
      '{run -f-a-ob  file1 file2.biz  }',
      '{ run -a -b       }',
      '{ fun -a -b       }',
    
    ].map(parseFlagAndFileList));
    .as-console-wrapper { min-height: 100%!important; top: 0; }

    A regex which almost covers the OP's wish of doing it all with just one simple pattern would look like this one ... /[\w.]+/g.

    It of cause ...

    // see ... [https://regex101.com/r/VFjeK1/3]
    const regXCommandTokens = (/[\w.]+/g);
    
    console.log([
    
      '{run file1 file2}',
      '{run    -a  file1 file2}',
      '{run    -a -o -f  file1 file2}',
      '{run -f-a-o  file1 file2}',
    
    ].map(command => command.match(regXCommandTokens).slice(1)));
    
    console.log([
    
      '{run}',
      '{run      }',
      '{run file1  }',
      '{run    -abc  file1.foo file2.bar  }',
      '{run    -ab -ogg -fgg  file1.baz file2  }',
      '{run -f-a-ob  file1 file2.biz  }',
      '{ run -a -b       }',
      '{ fun -a -b       }',
    
    ].map(command => command.match(regXCommandTokens).slice(1)));
    .as-console-wrapper { min-height: 100%!important; top: 0; }