I'm attempting to SELECT
a substring from a larger string that should end before the words "STORE" or "PROTECT".
The problem is that there are a few scenarios in which the words "STORE" or "PROTECT" can appear before or after each other in a string, and we need to extract the substring that appears before either of those two words.
Example Strings:
"UNLABELLEDa ABCD EFGH IJKLM FOO BAR, STORE AT 15-30°C." - Here there is no "PROTECT" but "STORE" word exists, so we should get a substring before "STORE".
"STORE UP TO 30°C (86°F). PROTECT FROM MOISTURE" - In this string there is no string before first occurrence of a word "STORE" so substring will be BLANK
"MEPO BKHGT FOO BARFOO BAGK VIAL. PROTECT FROM LIGHT, STORE BETWEEN X°C TO Y°C." - Query should return the substring before the word "PROTECT".
"CARTON CONTAINING 1 X VIAL SB245063 100MG LYOPHILISED POWDER FOR RECONSTITUION. STORE AT 2°C - 8°C. PROTECT FROM LIGHT." - In this string "STORE" word is exist before "PROTECT", so query should return a substring before the word "STORE".
NULL
- when the string is null query should return a empty value instead of error.
SQL Query:
DECLARE @TestVariable AS VARCHAR(MAX)='UNLAB ABCFG DRETFG FOO BARFOO 3990MG TABLETS, STORE AT 15-30°C.'
SELECT case when CHARINDEX('STORE', upper(@TestVariable)) > CHARINDEX('PROTECT', upper(@TestVariable))
then SUBSTRING(@TestVariable, 1, CHARINDEX('PROTECT', upper(@TestVariable))-1)
else SUBSTRING(@TestVariable, 1, CHARINDEX('STORE', upper(@TestVariable))-1)
end pack_description
If there are no words in a string from "PROTECT" or "STORE" and the string value is NULL
, the query returns an error.
What changes should be made to this query based on the examples above to meet our requirements?
SQL Server is a poor platform to do text parsing because of it's very limited pattern matching capabilities. This would be better done in an application that supports regular expressions (regexes), because regexes can better check for context like word boundaries with expressions like /\bSTORE\b/
.
However, if you need to do this in SQL server, you can add LEN()
as a third position selector and LEAST()
to chose the leftmost trim position. The NULLIF()
function can also be used to map not-found 0 positions to null, so that they will be ignored. This also uses LEFT()
instead of SUBSTRING()
.
SELECT
LEFT(D.String, LEAST(
NULLIF(CHARINDEX('STORE', D.String), 0) - 1,
NULLIF(CHARINDEX('PROTECT', D.String), 0) - 1,
LEN(D.String)
)) AS Result
FROM Data D
Note that LEAST()
is a new function only available in SQL Server 2022 and later. (Keep reading for alternatives that work in earlier versions.)
Purely for aesthetic reasons, a CROSS APPLY
can be used to separate the position calculation from the final select.
SELECT
LEFT(D.String, A.TrimPos) AS Result
FROM DATA D.
CROSS APPLY (
SELECT LEAST(
NULLIF(CHARINDEX('STORE', D.String), 0) - 1,
NULLIF(CHARINDEX('PROTECT', D.String), 0) - 1,
LEN(D.String)
) AS TrimPos
) A
For earlier SQL versions that don't have the LEAST()
function, use:
...
CROSS APPLY (
SELECT MIN(Pos) AS TrimPos
FROM (
VALUES
(NULLIF(CHARINDEX('STORE', D.String), 0) - 1),
(NULLIF(CHARINDEX('PROTECT', D.String), 0) - 1),
(LEN(D.String))
) V(Pos)
) A
Keywords can also be placed in a table and use a subselect to search for any keyword. The ISNULL()
function is used to inject LEN()
as the default trim position.
SELECT
LEFT(D.String, A.TrimPos) AS Result
FROM Data D
CROSS APPLY (
SELECT ISNULL(MIN(P.Pos), LEN(D.String)) AS TrimPos
FROM (
SELECT NULLIF(CHARINDEX(K.Keyword, D.String), 0) - 1 AS Pos
FROM Keywords K
) P
) A
Finally, the current match logic can yield false hits if the text contains larger words like "protection" or "restorer" that partially match the keywords. This can be resolved by using PATINDEX()
and prefixing the keywords with the not-a-letter pattern [^A-Z]
.
SELECT
LEFT(D.String, A.TrimPos) AS Result
FROM Data D
CROSS APPLY (
SELECT ISNULL(MIN(P2.AdjustedPos), LEN(D.String)) AS TrimPos
FROM (
SELECT NULLIF(PATINDEX('%[^A-Z]' + K.Keyword + '[^A-Z]%', ' ' + D.String + ' '), 0) - 2 AS Pos
FROM Keywords K
) P
CROSS APPLY (
SELECT CASE WHEN P.Pos < 0 THEN 0 ELSE P.Pos END AS AdjustedPos
) P2
) A
Sample results
Result | Tail |
---|---|
UNLABELLEDa ABCD EFGH IJKLM FOO BAR, | STORE AT 15-30°C. |
STORE UP TO 30°C (86°F). PROTECT FROM MOISTURE | |
MEPO BKHGT FOO BARFOO BAGK VIAL. | PROTECT FROM LIGHT, STORE BETWEEN X°C TO Y°C. |
CARTON CONTAINING 1 X VIAL SB245063 100MG LYOPHILISED POWDER FOR RECONSTITUION. | STORE AT 2°C - 8°C. PROTECT FROM LIGHT. |
Just a description here | |
Finish restorer | |
Paint protection | |
Bananas. | Discard after 90 days. |
null | |
null | null |
See this db<>fiddle for a demo of each of the above.