Invoice words are sometimes delimited by underscore character (_) in addition or instead of white space:
...
Some nr_11687767_ other 101308591
Invoice Nr.
M230714_some text
Kirjeldus
...
Sometimes it is terminated by newline
...
This nr_11687767_KMKR_EE101308591
Invoice Nr.
M230714
01.05.2023
Item
...
or by other white space delimiter :
...
Some nr_11687767_ Text
Invoice Nr M230714 Date 01.05.2023
Desc
...
Tried to extract number using RegEx
Regex.Match(tekst, @"(?si).*_?Invoice[\s_]?NR[\s_:\.]?(?<arvenumber>.*?)[\s_]");
Success is true but arvenumber group is empty.
How to get only number M230714 in arvenumber group ?
Using C# ASP.NET 7
I suggest a pattern like this
(?i)Invoice\s+Nr\.?[\s_]+(?<arvenumber>[\p{L}0-9]+)
where
(?i) - Ignore case when matching
Invoice - "Invoice"
\s+ - One or more whitespaces
Nr\.? - "Nr" with optional .
[\s_]+ - One or more namespaces or _
(?<arvenumber>[\p{L}0-9]+) - arvenumber which contains of letters and / or digits