I have a set of product descriptions from which i want to extract product attributes through regular expressions.
https://regex101.com/r/HTTfNR/1
BL460c G6 X5550 6G 1P Svr
BL460c G6 E5540 6G 1P Svr
BL460c G6 E5540 6G 1P Svr
BL460c G6 E5530 6G 1P Svr
BL460c G6 L5520 6G 1P Svr
BL460c G6 E5520 6G 1P Svr
BL460c G6 E5506 6G 1P Svr
BL460c G6 E5502 6G 1P Svr
BL280c G6 L5520 2G LP 1P Svr
BL280c G6 E5520 2G 1P Svr
BL280c G6 E5540 2G 1P Svr
BL280c G6 E5502 2G 1P Svr
S-Buy BL460c G6 E5540 8G 2P Svr
S-Buy BL460c G6 E5530 4G 1P Svr
S-Buy BL460c G6 E5530 4G 1P Svr
BL2x220c G6 E5540 24G 2P 250GB Svr
BL2x220c G6 E5530 24G 2P 250GB Svr
BL2x220c G6 L5530 24G 2P 250GB Svr
BL2x220c G6 L5520 24G 2P
BL2x220c G6 E5640 2x2P 24G Svr
BL2x220c G6 E5630 2x2P 24G Svr
BL2x220c G6 L5640 2x2P 24G Svr
BL2x220c G6 Mod0 Svr
BL280c G6 X5650 6G 1P Svr
BL280c G6 E5630 4G 1P Svr
BL280c G6 L5640 4G 1P Svr
BL280c G6 E5506 2G 1P Svr
BL620c G7 E7-2860 32G Svr
BL620c G7 E7-2850 32G Svr
BL620c G7 E7-2830 32G Svr
BL680c G7 E7-4860 64G Svr
BL680c G7 E7-4860 64G Svr
BL680c G7 E7-4850 64G Svr
BL680c G7 E7-4830 64G Svr
BL680c G7 E7 4830 64G Svr
I want to solve this using regular expressions.
I have tried this but i am unable to get this working for all use cases of my 1step.
\b(?!\d)([ELX0-9-])\w{1,}
I want to Extract x5550
/E5540
/E7-2860
/E7-2860
/E7 4830
as my 1st step. Can someone help me with a code to extract this text from above text?
If the match should start with either E
X
or L
you can omit the negative lookahead (?!\d)
and only use those in the character class without the hyphen and the digits.
Then match an optional digit followed by either a space or hyphen.
\b[EXL](?:\d[ -])?\d+(?!\S)
In parts
\b[EXL]
Word boundary, then match either E
X
or L
(?:\d[ -])?
Optionally match a digit followed by a space or hyphen\d+
Match 1+ digits(?!\S)
Negative lookahead, assert what is directly on the right is not a non whitespace character