Regular Expressions in Query

I have a column defined as NVARCHAR(512) that I need to search to see if it contain the value ABC followed by 10 digits.

So I need to be able to find if this column has a value like "ABC1234567890". I know I can do this:

WHERE Column1 LIKE '%ABC[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'

and this will find the columns with ABC + 10 digits. I have to do this for about 12 different variations, so it doesn't seem this is the most efficient way. It seems like I should be able to do something like

WHERE Column1 LIKE '%ABC\[Next Ten Digits are Numeric\]

instead of copy / pasting [0-9] ten times.

Solution

In a few years from now we should be able to use REGEXP_LIKE and do something like

WHERE REGEXP_LIKE(Column1, 'ABC[0-9]{10}')

If you just wanted to match 0-9 or

WHERE REGEXP_LIKE(Column1, 'ABC\d{10}');

To match any Unicode digit

But this is currently only preview functionality and only in Azure.

For now an approach that avoids copy and paste and makes it easier to see the number of repetitions is

WHERE Column1  COLLATE Latin1_General_BIN 
        LIKE CONCAT('%ABC',REPLICATE('[0-9]',10),'%')

The expression CONCAT('%ABC',REPLICATE('[0-9]',10),'%') still ends up producing the same underlying pattern of %ABC[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]% but in a more DRY way.

The reason for the explicit COLLATE clause is to avoid matching potential unwanted extra characters that sort between 0 and 9 in some other collations.

It is also occasionally useful to invert this type of logic. You can use TRANSLATE to convert all non zero digits to 0 and then potentially use somewhat simpler expressions against the canonical form.

SELECT Column1 
FROM Table1 t1
CROSS APPLY (VALUES(TRANSLATE (Column1,'123456789','000000000'))) v1(Column1Canonical)
WHERE Column1Canonical LIKE '%ABC' + REPLICATE('0',10) + '%'