Search code examples
t-sqlpatindex

PATINDEX does not recognize dot and comma


I have a column that should contain phone numbers but it contains whatever the user wanted. I need to create an update to remove all the characters after an invalid character.

To do this I am using a regex as PATINDEX('%[^0-9+-/()" "]%', [MobilNr]) and it seemed to work until I had some numbers as +1235, 36446 and to my surprise the result is 0 instead of 6. Also if the number contains . it returns 0.

Does PATINDEX ignores dot(".") and comma(",")? Are there other characters that PATINDEX will ignore?


Solution

  • It's not that PATINDEX ignores the comma and the dot, it's your pattern that created this problem.

    With PATINDEX, the hyphen char (-) has a special meaning - it's in fact an operator that denotes an inclusive range - like 0-9 denotes all digits between 0 and 9 - so when you do +-/ it means all the chars between + and / (inclusive, of course). The comma and dot chars are within this range, that's why you get this result.

    Fixing the pattern is easy: either use | as a logical or, or simply move the hyphen to the end of the pattern:

    SELECT PATINDEX('%[^0-9/()" "+-]%', '+1235, 36446') -- Result: 6