Search code examples
javascriptregexcapturing-group

Placing each matched value in its own capturing group


I've been at this for too long, trying to figure out how to match a comma-delimited string of values, while breaking apart the values into their own capturing groups. Here are my requirements:

  • No leading comma
  • Terms can be alphanumeric, with between 1 and 7 characters
  • Min: 1 term; Max: unlimited
  • Unlimited whitespace between terms and commas
  • No trailing comma

I'm so close, but I'm not able to get all terms in the string into their own capture groups. Instead it places the last matched term from the first capturing group into group #1, instead of placing all matches into previous groups. So here's my example:

abc1234, def5678, ghi9012

I would expect abc1234 to be group #1, def5678 to be group #2, and ghi9012 to be group #3. Instead, using the expression below, I get def5678 in group #1 and ghi9012 in group #2.

/(?:([A-z0-9]{1,7})\s*,\s*)+([A-z0-9]{1,7})/g

Link to RegExr example

I'm pretty sure I haven't set up my capturing/non-capturing groups correctly. Any help would be greatly appreciated.


Solution

  • This can do it for you. Using the extraction regex the value is in group 1. Also the value is trimmed.
    Let me know if you need one for quoted fields.

    Note that the requirement for 1-7 chars can't be enforced using the extraction one,
    unless its validated ahead of time.

    Validation regex:

     # /^(?:(?:(?:^|,)\s*)[a-zA-Z0-9]{1,7}(?:\s*(?:(?=,)|$)))+$/
    
     ^    
     (?:
          (?:                           # leading comma + optional whitespaces
               (?: ^ | , )
               \s* 
          )
          [a-zA-Z0-9]{1,7}              # alpha-num, 1-7 chars
          (?:                           # trailing optional whitespaces
               \s* 
               (?:
                    (?= , )
                 |  $ 
               )
          )
     )+
     $ 
    

    Extraction regex.

     # /(?:(?:^|,)\s*)([^,]*?)(?:\s*(?:(?=,)|$))/
    
    
     (?:                           # leading comma + optional whitespaces
          (?: ^ | , )
          \s* 
     )
     ( [^,]*? )                    # (1), non-quoted field
     (?:                           # trailing optional whitespaces
          \s* 
          (?:
               (?= , )
            |  $ 
          )
     )