Search code examples
regexteradatasubstrteradata-sql-assistant

Teradata REGEX or SUBSTR to remove the text between two *'s and the asterisk?


I'm working in teradata with a dataset that has several occurrences of data in the following format:

*6A*H.ORTHO I
*4A*IMP
*16A*T.IMPLANTS
*2A*HIMPLANTS
*9A*IMP
*5A*F.IMPLANT
*6A*DIMP
*4A*TISSUE
*5A*KIMP
*7A*IMP
*10A*D.IMP
*3A*W.LSH
*10A*IMP
*16A*IMP
*22A*T.IMPLANTS

In the dataset above I'm attempting to extract everything after the second occurrence of an asterick. I.E. D.IMP, IMP, T.IMPLANTS, F.IMPLANT, etc..

I've attempted to use SUBSTR and came close using:

SUBSTR(TRIM(FSS.Surgical_Inventory_Code),1,
                    INDEX(TRIM(FSS.Surgical_Inventory_Code),'*')-1)

But, that only returns the data after the first *.

I believe the best solution to solve problem would be using a REGEX expression or SUBSTR. There is a function in teradata called REGEXP_SUBSTR. I'm not exactly sure how to create a REGEX statement to solve my problem.


Solution

  • If you only ever have 2 asterisks in your string, you can use STRTOK:

    strtok(<source string>,'*',2)