Search code examples
javascriptregexxgettext

Regular expression to extract xgettext function names and params from cli args


I'm working on a CLI app, which allows the user to specify an argument with function names along with arguments. It's actually using the same syntax as xgettext, such as:

--keywords=__,dgettext:2,dcgettext:2,ngettext:1,2,dpgettext2:2c,3

I need to figure out a regex that would break this down into an array like this:

['__', 'dgettext:2', 'dcgettext:2', 'ngettext:1,2', 'dpgettext2:2c,3'];

How can I do it (in Javascript, for example)?

Here's what I have so far:

(((?!([0-9\s,])).|^)[a-zA-Z_]+[A-Za-z0-9_]*[:]*([0-9]*[a-z]*,*)*)

Obviously this has a problem: it's also capturing the comma each time. Any idea how I can leave it out?


Solution

  • Based on @Fede's answer, here's a complete snippet that does exactly what I needed:

    // The last keyword is invalid - it begins with a number,
    // which is not allowed, so it should not considered a separate keyword
    var keywords = "__,dgettext:2,dcgettext:2,ngettext:1,2,dpgettext2:2c,3,__,_n,_,2";
    keywords.split(/,(?=[a-z_]+\w*)/gi);
    

    What it does is that it looks for commas that are followed by a valid keyword and then splits the string based on that into an array, which is exactly what I needed.