Search code examples
sqlsql-serverdata-quality

How to split columns containing characters and numbers in SQL Server?


In a table I have a column Building Height which contains the heights of buildings. Annoyingly the values are entered like this e.g. '18m', '8m' or '8.1m'.

Ideally, it would have the 'm' in the column name and numeric values in the rows but that's not the case. I do not have admin rights to the table in SQL Server.

I need to perform a data quality rule where I need to perform greater than or less than operators on the Building Height column.

How do I overcome this obstacle?

Example of the table:

Building Height No Storeys
8.1m 3
13m 5
7m 6
9.0m 9

Data quality rule to perform:

  • If building height is less than 11m, then storeys should be less than 5

I have to find rows that do not fit criteria. Obviously the obstacle is the Building Height column containing m next to the number.

I've looked into substring and string_split, but as I do not have admin rights to alter the table I can't really change much, can only query.


Solution

  • 2 steps:

    1. Remove the m
    2. Cast result to DECIMAL
    SELECT 
        REPLACE([Building Height],'m','') step1,
       CAST(REPLACE([Building Height],'m','') as DECIMAL(8,2)) step2,
       [No Storeys]
    FROM mytable;
    

    see: DBFIDDLE

    Using STRING_SPLIT(), you can do:

    SELECT 
      m.*,
      CAST(value as DECIMAL(8,3)) as [Height in meters]
    FROM mytable m
    CROSS APPLY STRING_SPLIT([Building Height],'m')
    WHERE value<>'';
    

    see: DBFIDDLE