Search code examples
teradataregexp-substr

Tag key & value using Teradata Regular Expression


I have a TERADATA dataset that resembles the below :

'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020'

I want to extract tag value from the above based on the key.

Ex:

with comm as 
(
select  'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor' as text
)
select regexp_substr(comm.text,'[^: ]+',1,4)
 from comm where regexp_substr(comm.text,'[^: ]+',1,3) = 'IssueType';

Is there a way to query without having to change the position arguments for every tag. Also I am finding the last field a little tricky with date & time fields.

Any help is appreciated.

Thank you.


Solution

  • There's the NVP function to access Name/Value-pair data, but to split into multiple rows you need either strtok_split_to_table or regexp_split_to_table. The tricky part in your case are the delimiters, would be easier if they were unique instead of ' 'and ':':

    WITH comm AS 
     (
       SELECT 1 as keycol, -- should be a key column in your table, either numeric or varchar
          'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020' AS text
     )
    SELECT id, tokennum, token, 
       -- get the key
       StrTok(token,':', 1) AS "Key",
       -- get the value (can't use StrTok because of ':' delimiter)
       Substring(token From Position(': ' IN token)+2) AS "Value"
    FROM TABLE
     ( RegExp_Split_To_Table(comm.keycol
                             ,comm.text
                             ,'( )(?=[^ ]+: )' -- assuming names don't contain spaces: split at the last space before ': '
                             , 'c') 
    RETURNS (id INT , tokennum INTEGER, token VARCHAR(1000) CHARACTER SET Latin)) AS dt