Search code examples
sql-servert-sql

Extract particular value from middle of a column which is separated by '/' and '//'


One of my table column has data stored like this. Some of the values start with single '/' and some of them with double '//'

/etc/data/env/source/sourcename1/filename1/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp
//etc/data/env/source/sourcename2/filename2/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp
//etc/data/env/source/sourcename3/filename3/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp
/etc/data/env/source/sourcename4/filename4/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp
/etc/data/env/source/sourcename5/filename5/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp

I need to extract the value of filenames from the column. Tried charindex(), left() functions but failed to get the desired value.

Desired output

filename1
filename2
filename3
filename4
filename5

Solution

  • Please try the following solution based on tokenization via SQL Server's XML and XQuery functionality.

    Notable points:

    • CROSS APPLY is tokenizing tokens column as XML.
    • XQuery .value() method gives us a desired output by retrieving 7th token.

    SQL

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
    INSERT INTO @tbl (tokens) VALUES
    ('/etc/data/env/source/sourcename1/filename1/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp'),
    ('//etc/data/env/source/sourcename2/filename2/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp'),
    ('//etc/data/env/source/sourcename3/filename3/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp'),
    ('/etc/data/env/source/sourcename4/filename4/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp'),
    ('/etc/data/env/source/sourcename5/filename5/file-arrivaltimestamp/file-processedtimestamp/file-archivedtimestamp');
    -- DDL and sample data population, end
    
    DECLARE @separator CHAR(1) = '/';
    
    SELECT *
        , c.value('(/root/r[7]/text())[1]', 'VARCHAR(128)') AS result
    FROM @tbl AS t
    CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
        REPLACE(REPLACE(tokens,'//',@separator), @separator, ']]></r><r><![CDATA[') + 
        ']]></r></root>' AS XML)) AS t1(c);
    

    Output

    id result
    1 filename1
    2 filename2
    3 filename3
    4 filename4
    5 filename5