Search code examples
sql-server-2012

Sql 2012 - Get data within varchar column


I have a column in a table that contains data that looks like this:

Fusion 2-4 L Jt w Autol Sub, Post Appr P Col, Open [0SG1071],Release Lumbar Nerve, Open Approach [01NB0ZZ],Excision of Right Pelvic Bone, Open Approach [0QB20ZZ],Supplement Spinal Meninges with Nonaut Sub, Open Approach [00UT0KZ]

I am tasked with getting the data within the brackets of that string and presenting them as either individual records or as one record containing all the values (in which they appear in the string) as a CSV.

In this case, the CSV extract would be

0SG1071,01NB0ZZ,0QB20ZZ,00UT0KZ

Any ideas?

the only thing I have been able to do is determine that there are 4 occurrence of a '[' character using this:

  select * from (SELECT (LEN([FullPX]) - LEN(REPLACE([FullPX], '[', '')))/LEN('[') as NumOfOccurrence, [FullPX] FROM [Import].[dbo].[_TMC_TEMP]) i1 where i1.NumOfOccurrence > 0

Solution

  • One way to generate the comma-delimited values is to replace the delimiters with XML opening/closing tags and use XML methods to extract the values. Furthermore FOR XML can be used for aggregate string concatenation in old unsupported SQL Server versions where STRING_AGG is not available.

    Below is an example that also replaces an invalid XML character (ampersand) with entity reference to facilitate XML parsing. This can be extended for other characters in your data that need to be escaped.

    DECLARE @example TABLE(ID int, UnstructuredData varchar(MAX));
    INSERT INTO @example VALUES(1, 'Fusion 2-4 L Jt w Autol Sub, Post Appr P Col, Open [0SG1071],Release Lumbar Nerve, Open Approach [01NB0ZZ],Excision of Right Pelvic Bone, Open Approach [0QB20ZZ],Supplement Spinal Meninges with Nonaut Sub, Open Approach [00UT0KZ]');
    INSERT INTO @example VALUES(2, 'Data with invalid XML characters such as an ampersand (&) must be escaped with entity references [ABCDEFGH]');
    
    SELECT 
        ID
        , STUFF(CAST(
            (SELECT ',' + xml_node.value('.', 'varchar(10)')
             FROM (
                SELECT CAST('<data>' + REPLACE(REPLACE(REPLACE(UnstructuredData, '&', '&amp;'), '[', '<value>'), ']', '</value>') + '</data>' AS xml) AS UnstructuredDataXml
            ) AS column_data
            CROSS APPLY UnstructuredDataXml.nodes('//value') AS xml_nodes(xml_node)
            FOR XML PATH('')
            ) AS varchar(MAX)),1,1,'') AS csv_data
    FROM @example;
    

    Results:

    ID csv_data
    1 0SG1071,01NB0ZZ,0QB20ZZ,00UT0KZ
    2 ABCDEFGH