I have a regex to find regex like following:
^[a-zA-Z]{2}[0-9]{2}\s?[a-zA-Z0-9]{4}\s?[0-9]{4}\s?[0-9]{3}([a-zA-Z0-9]\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,3})?$
This finds the iban like following:
DE89 3704 0044 0532 0130 00 ss
AT61 1904 3002 3457 3201
FR14 2004 1010 0505 0001 3
DE89370400440532013000
AT611904300234573201
but my iban numbers are in setences:
This is a cusotmer iban DE89 3704 0044 0532 0130 00 from somewhere
so it does not find the iban. how can I fix it?
The first issue is that you have start-of-input (^
) and end-of-input ($
) assertions: you'll want to replace these with word-break assertions (\b
).
Secondly, your regex tries to allow for a space between groups of four characters, but with [a-zA-Z0-9]{0,4}
you also allow these groups to have fewer than four characters, even when they are not the last group. This makes it much more likely that the pattern will match a word that follows the IBAN. You can however design a pattern that will only allow the last group to be shorter than 4 characters (when there are spaces).
The \s
pattern will also allow newline characters, which is actually not allowed in IBAN. Just use a standard space character instead.
Not a problem, but you could make your regex a bit shorter by removing a-z
to only match uppercase letters, adding the RegexOptions.IgnoreCase
option or prefixing your regex with the inline (?i)
modifier. Also, [0-9]
can be just \d
. If you expect an IBAN to be in upper case (which is a quite common practice), you could make the regex less likely to match trailing characters that do not belong to the IBAN, like for instance XX99 1234 1234 1234 1234 from
: according to the rules this would pass the validation when including also "from", but as "from" is clearly intended as a word of the surrounding sentence, it should better not be matched. With requiring only capitals you can counter such misinterpretations.
Here is the suggested regex:
\b[A-Z]{2}\d\d(?:(?: ?[A-Z\d]{4}){2,6} ?[A-Z\d]{3}|( ?[A-Z\d]{4}){3,7}(?: ?[A-Z\d]{1,2})?)\b
I'll leave it up to you whether you want to combine this with the IgnoreCase
option.