I want to extract "numbers" (it can be package number, invoice number etc.) from lines. It's just non-whitespace characters (e.g.: 123
, ABC
, Abc
, ABC123
, ABC-123
, X-ABC/123/456
- simply \S+
regexp).
I have lines that can contain "numbers". There are two possible cases:
Number:
.Example lines:
ABC123 bla bla
Number: ABC123 bla bla
Some words 123 Number: ABC123 bla bla
From those each example line I want to extract "number": ABC123
.
I know how to write regexp for second case (example 2 and 3 lines): (?:Number: )(\S+)
(non-captured group with prefix Number:
and captured group with non-whitespace charactes).
But what with first case?
What i tried:
(?:Number: )?(\S+)
I get many matches, but it's not a problem because I can get first match in each line in my code.
But the problem is in match 7: I get word Some
instead of number ABC123
.
(?:^(\S+))|(?:(?:Number: )(\S+))
.But the problem is the same, I get word Some
. And this is worse because I get Number:
Number:
at start of line to eliminate second problem from previous step: (?:^(?!Number:)(\S+))|(?:(?:Number: )(\S+))
.But there is still problem with getting random word (Some
) at beginning of line even when prefix Number:
exists with "number" in the middle of line.
Demo: https://regex101.com/r/G9UFak/1
Question a bit similar to: Regex multiple characters but without specific string
You can use
(?:.*Number:\s*|^)(\S+)
See the regex demo.
Details
(?:.*Number:\s*|^)
- either of the two alternatives:
.*Number:\s*
- any zero or more chars other than line break chars, as many as possible, Number:
and zero or more whitespaces (if you need to stay on the line, replace \s
with [^\S\r\n]
or \h
/ [\p{Zs}\t]
if supported)|
- or^
- start of a line (with m
option in PCR0-like engines)(\S+)
- Group 1: any one or more non-whitespace chars.