Search code examples
sqlt-sqlsqldatatypes

data type of each characters in a varchar T-sql


I'm curious on the data I get from someone. Most of the time I need to get 3 integers then a space then eight integers.

And The integration created a column varchar(20) ... Don't doubt it works, but that gives me some matching errors.

Because of this, I'd like to know what is the data type of the characters on each row.

For exemple : 0 is for integer, s for space, a for char, * for specific

AWB             | data type
---------------------------------
012 12345678    | 000s00000000
9/5 ab0534      | 0*0saa0000

I'd like to know if there is a function or a formula to get this kind of results. Right after I'll be able to group by this column and finally be able to check how good is the data quality.

I don't know if there is a specific word for what I tried to explain, so excuse me if this is a duplicate of a post, I didn't find it.

Thank you for your feedback.


Solution

  • There's nothing built-in, but you might use an approach like this:

    DECLARE @tbl TABLE(ID INT IDENTITY,AWB VARCHAR(100));
    INSERT INTO @tbl VALUES
     ('012 12345678')
    ,('9/5 ab0534');
    
    WITH cte AS
    (
        SELECT t.ID
              ,t.AWB
              ,A.Nmbr 
              ,C.YourMask
        FROM @tbl t
        CROSS APPLY (SELECT TOP (DATALENGTH(t.AWB)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
        CROSS APPLY (SELECT SUBSTRING(t.AWB,A.Nmbr,1)) B(SingleCharacter)
        CROSS APPLY (SELECT CASE WHEN B.SingleCharacter LIKE '[0-9]' THEN '0'
                                 WHEN B.SingleCharacter LIKE '[a-z]' THEN 'a'
                                 WHEN B.SingleCharacter = ' ' THEN 's'
                                 ELSE '*' END) C(YourMask)
    )
    SELECT ID
          ,AWB
          ,(
            SELECT YourMask 
            FROM cte cte2
            WHERE cte2.ID=cte.ID
            ORDER BY cte2.Nmbr
            FOR XML PATH(''),TYPE  
           ).value('.','nvarchar(max)') YourMaskConcatenated
    FROM cte
    GROUP BY ID,AWB;
    

    The idea in short:

    The cte will create a derived set of your table.
    The first CROSS APPLY will create a list of numbers as long as the current AWB value. The second CROSS APPLY will read each character separately.
    The third CROSS APPLY will finally use some rather simple logic to translate your values to the mask you expect.

    The final SELECT will then use GROUP BY and a correlated sub-query with FOR XML to get the mask characters re-concatenated (With version v2017+ this would be easier calling STRING_AGG()).