I am struggling to generate a regex expression that will match each word separated by a comma and the word 'and'.
My input strings will be similar to, "We will launch new USD-M NFP, MOVR, PLEX and GAS Perpetual Contracts"
and expected output would match ['NFP','MOVR','PLEX','GAS']
This is my current regex, USD-M\s+(\w+((?:\s*,\s*\w+)|(?:\s*and\s*\w+))*)
Which matches the characters 'USD-M' then any whitespaces before trying to capture the words separated by a comma or 'and'.
Javascript:
const input = "We will launch new USD-M NFP, MOVR, PLEX and GAS Margin Contracts";
const regex = /USD-M\s+(\w+((?:\s*,\s*\w+)|(?:\s*and\s*\w+))*)/i;
const output = input.match(regex);
However it outputs
0:'USD-M NFP, MOVR, PLEX and GAS'
1:'NFP, MOVR, PLEX and GAS'
2:' and GAS'
I could use output[1]
and split by commas & 'and' but surely regex can do what I want.
I appreciate the help!
Edit:
The regex would need to be smart enough so that there being no commas or 'and' only has one match. For example,
Input: "We will launch new USD-M NFP Spot Market"
Output: ['NFP']
You can use a regex like
const text = 'We will launch new USD-M NFP, MOVR, PLEX and GAS Perpetual Contracts';
console.log(text.match(/\b(?<=\bUSD-M\s+(?:\w+(?:\s*,\s*|\s+and\s+))*)\w+/g));
See the regex demo.
Details
\b
- a word boundary(?<=\bUSD-M\s+(?:\w+(?:\s*,\s*|\s+and\s+))*)
- a positive lookbehind that requires, to the left of the current location:
\bUSD-M
- a whole word USD-M
(\b
is a word boundary and after M
, there must be a whitespace because...)\s+
- one or more whitespaces(?:\w+(?:\s*,\s*|\s+and\s+))*
- zero or more occurrences of
\w+
- one or more word chars(?:\s*,\s*|\s+and\s+)
- either a comma enclosed with optional whitespaces, or the word and
inside one or more whitespaces\w+
- one or more ASCII letters, digits or underscores.