Search code examples
sqlazuredatabricksspark-notebookazure-notebooks

Databricks SQL query for finding non alphanumeric values in a column


I tried finding many question in this forum but couldn't find specific question to Databricks SQL, so posting here, any help or link to existing question would be appreciated.

How to find all the records where a column has a characters other than 0-9 and A-Z/a-z? Basically, if a column has even one character which is not alphanumeric should come in the output.

I tried this but it's showing no records, however there are many records where the column value has special characters like ".", "?" etc.:

%sql
select * from part_details where part_number like '%[^a-zA-Z0-9]%';

-- tried this as well
select * from part_details where part_number NOT like '%[a-zA-Z0-9]%';

Please note, this ask is specific to databricks SQL only. I am running my query in Azure Databricks notebook.


Solution

  • I agree with @David Browne You can try the below query:

    In Databricks SQL, you can use the REGEXP function to find all the records where a column has characters other than alphanumeric.

    1st syntax:

    %sql
    SELECT *
    FROM part_details
    WHERE NOT part_number rlike '^[a-zA-Z0-9]+$';
    

    2nd syntax:

    %sql
    SELECT *
    FROM part_details
    WHERE part_number REGEXP '[^a-zA-Z0-9]'
    

    This query uses the REGEXP function with a regular expression pattern [^a-zA-Z0-9] to match any character that is not alphanumeric.

    Note: that the regular expression pattern [^a-zA-Z0-9] matches any character that is not an uppercase letter, lowercase letter, or a digit.

    Results:

    id  part_number
    4   JKL!012