Search code examples
javascriptregex

Extracting all words separated by comma and the conjunction AND after a specific word


I am struggling to generate a regex expression that will match each word separated by a comma and the word 'and'.

My input strings will be similar to, "We will launch new USD-M NFP, MOVR, PLEX and GAS Perpetual Contracts" and expected output would match ['NFP','MOVR','PLEX','GAS']

This is my current regex, USD-M\s+(\w+((?:\s*,\s*\w+)|(?:\s*and\s*\w+))*) Which matches the characters 'USD-M' then any whitespaces before trying to capture the words separated by a comma or 'and'.

Javascript:

const input = "We will launch new USD-M NFP, MOVR, PLEX and GAS Margin Contracts";
const regex = /USD-M\s+(\w+((?:\s*,\s*\w+)|(?:\s*and\s*\w+))*)/i;
const output = input.match(regex);

However it outputs

0:'USD-M NFP, MOVR, PLEX and GAS'
1:'NFP, MOVR, PLEX and GAS'
2:' and GAS'

I could use output[1] and split by commas & 'and' but surely regex can do what I want.

I appreciate the help!

Edit:

The regex would need to be smart enough so that there being no commas or 'and' only has one match. For example,

Input: "We will launch new USD-M NFP Spot Market"
Output: ['NFP']

Solution

  • You can use a regex like

    const text = 'We will launch new USD-M NFP, MOVR, PLEX and GAS Perpetual Contracts';
    console.log(text.match(/\b(?<=\bUSD-M\s+(?:\w+(?:\s*,\s*|\s+and\s+))*)\w+/g));

    See the regex demo.

    Details

    • \b - a word boundary
    • (?<=\bUSD-M\s+(?:\w+(?:\s*,\s*|\s+and\s+))*) - a positive lookbehind that requires, to the left of the current location:
      • \bUSD-M - a whole word USD-M (\b is a word boundary and after M, there must be a whitespace because...)
      • \s+ - one or more whitespaces
      • (?:\w+(?:\s*,\s*|\s+and\s+))* - zero or more occurrences of
        • \w+ - one or more word chars
        • (?:\s*,\s*|\s+and\s+) - either a comma enclosed with optional whitespaces, or the word and inside one or more whitespaces
    • \w+ - one or more ASCII letters, digits or underscores.