Search code examples
regexregex-greedy

RegEx required to capture full word, space and • symbol in all instances within a varaible selection of text


I need to be able to select variable elements from the following ingredient list example.

I wish to collect the 'full word, space & •' in all instances.

INGREDIENTS: ALCOHOL DENAT. • FRAGRANCE (PARFUM) • WATER\AQUA\EAU • HYDROXYCITRONELLAL • LIMONENE • BENZYL BENZOATE • CITRONELLOL • GERANIOL • COUMARIN • FARNESOL • CITRAL • BENZYL ALCOHOL • CINNAMYL ALCOHOL • LINALOOL • ALCOHOL • DIPROPYLENE GLYCOL • ETHYLHEXYL METHOXYCINNAMATE • BUTYL METHOXYDIBENZOYLMETHANE • ETHYLHEXYL SALICYLATE • TRIS(TETRAMETHYLHYDROXYPIPERIDINOL) CITRATE • DILAURYL THIODIPROPIONATE • TOCOPHEROL • BHT • BENZOIC ACID • RED 4 (CI 14700) • EXT. VIOLET 2 (CI 60730) • YELLOW 6 (CI 15985) <ILN46472>

I have \b\w+\s• but this is only selecting 'EAU •' within the copy, where as I need all instances within the list

DENAT. •
(PARFUM) •
EAU •
HYDROXYCITRONELLAL •
LIMONENE •
BENZOATE •
CITRONELLOL •
GERANIOL •
COUMARIN •
FARNESOL •
CITRAL •
ALCOHOL •
ALCOHOL •
LINALOOL •
ALCOHOL •
GLYCOL •
METHOXYCINNAMATE •
METHOXYDIBENZOYLMETHANE •
SALICYLATE •
CITRATE •
THIODIPROPIONATE •
TOCOPHEROL •
BHT •
ACID •
(CI 14700) •
(CI 60730) •

Solution

  • To get those matches, you might use:

    (?:\([^()]*\)|\w+\.?)\s•
    

    The pattern matches

    • (?: Non capture group
      • \([^()]*\) Match from (....)
      • | Or
      • \w+\.? Match 1+ word chars followed by an optional .
    • ) Close the non capture group
    • \s• Match a whitespace char and

    See a regex demo

    If there has to be at least a word character in between the parenthesis:

    (?:\([^\w()]*\w[^()]*\)|\w+\.?)\s•
    

    See another regex demo.

    Note that \w can also match _