Search code examples
c#regexiban

Iban regex to find in text


I have a regex to find regex like following:

^[a-zA-Z]{2}[0-9]{2}\s?[a-zA-Z0-9]{4}\s?[0-9]{4}\s?[0-9]{3}([a-zA-Z0-9]\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,3})?$

This finds the iban like following:

DE89 3704 0044 0532 0130 00 ss
AT61 1904 3002 3457 3201
FR14 2004 1010 0505 0001 3
DE89370400440532013000
AT611904300234573201

but my iban numbers are in setences:

This is a cusotmer iban DE89 3704 0044 0532 0130 00 from somewhere

so it does not find the iban. how can I fix it?


Solution

  • The first issue is that you have start-of-input (^) and end-of-input ($) assertions: you'll want to replace these with word-break assertions (\b).

    Secondly, your regex tries to allow for a space between groups of four characters, but with [a-zA-Z0-9]{0,4} you also allow these groups to have fewer than four characters, even when they are not the last group. This makes it much more likely that the pattern will match a word that follows the IBAN. You can however design a pattern that will only allow the last group to be shorter than 4 characters (when there are spaces).

    The \s pattern will also allow newline characters, which is actually not allowed in IBAN. Just use a standard space character instead.

    Not a problem, but you could make your regex a bit shorter by removing a-z to only match uppercase letters, adding the RegexOptions.IgnoreCase option or prefixing your regex with the inline (?i) modifier. Also, [0-9] can be just \d. If you expect an IBAN to be in upper case (which is a quite common practice), you could make the regex less likely to match trailing characters that do not belong to the IBAN, like for instance XX99 1234 1234 1234 1234 from: according to the rules this would pass the validation when including also "from", but as "from" is clearly intended as a word of the surrounding sentence, it should better not be matched. With requiring only capitals you can counter such misinterpretations.

    Here is the suggested regex:

    \b[A-Z]{2}\d\d(?:(?: ?[A-Z\d]{4}){2,6} ?[A-Z\d]{3}|( ?[A-Z\d]{4}){3,7}(?: ?[A-Z\d]{1,2})?)\b
    

    regex101

    I'll leave it up to you whether you want to combine this with the IgnoreCase option.