I have the following 5 STRINGs that I am trying to parse for e.g:
1. Person is (Christopher Bowles known as Chris)
2. This is a test entry
3. Identified as Jonathan Sykes known as John)
4. Registered as (Patrick Joseph known as Pat) (Or also Patty)
5. Guy called as Richard Smith known as Rich) (Or also Richie)
And my query is as follows:
SELECT DisplayName = LTRIM(RTRIM(SUBSTRING(n.name, CHARINDEX('(', n.name) + 1, (CHARINDEX('known as', n.name) - CHARINDEX('(', n.name)) - 1)))
FROM #nameTest n
WHERE 1 = 1
AND n.name LIKE '%known as%'
This fails on the 5th string with following error:
Invalid length parameter passed to the LEFT or SUBSTRING function.
This is because the known as comes BEFORE the '('. My question is, how do I handle such situation? Any idea would be appreciated?
To re-create the test data in Microsoft SQL Server 2022:
CREATE TABLE #nameTest
(
name NVARCHAR(1024) NOT NULL,
)
INSERT INTO #nameTest(name)
VALUES ('Person is (Christopher Bowles known as Chris)'),
('This is a test entry'),
('Identified as Jonathan Sykes known as John)'),
('Registered as (Patrick Joseph known as Pat) (Or also Patty)'),
('Guy called as Richard Smith known as Rich) (Or also Richie)')
The output should be:
1. Christopher Bowles
2. Identified as Jonathan Sykes
3. Patrick Joseph
Let's decontruct the problem a bit. For the issues like "invalid length" the best way to debug it is to output the parts that go into SUBSTRING:
SELECT CHARINDEX('(', n.name) + 1, CHARINDEX('known as', n.name)
, CHARINDEX('known as', n.name) - CHARINDEX('(', n.name)
, *
FROM #nameTest n
WHERE 1 = 1
AND n.name LIKE '%known as%'
This outputs:
(column 1) | (column 2) | (column 3) | name |
---|---|---|---|
12 | 31 | 20 | Person is (Christopher Bowles known as Chris) |
1 | 30 | 30 | Identified as Jonathan Sykes known as John) |
16 | 31 | 16 | Registered as (Patrick Joseph known as Pat) (Or also Patty) |
45 | 29 | -15 | Guy called as Richard Smith known as Rich) (Or also Richie) |
We see that there's one value which is negative and substring won't like that. There are many ways to mitigate the issue:
select ...
WHERE ...
and CHARINDEX('known as', n.name) > CHARINDEX('(', n.name)
SELECT DisplayName = case when CHARINDEX('known as', n.name) > CHARINDEX('(', n.name) then LTRIM(RTRIM(SUBSTRING(n.name, CHARINDEX('(', n.name) + 1, (CHARINDEX('known as', n.name) - CHARINDEX('(', n.name)) - 1))) end
...
This keeps the row but turns it into NULL if indexes aren't within correct order.
To simplify this, i usually create "variables" which removes some of the copypaste, using CROSS APPLY:
SELECT DisplayName = LTRIM(RTRIM(SUBSTRING(n.name, startParen + 1, startKnownAs - startParen - 1)))
FROM #nameTest n
CROSS APPLY (
SELECT CHARINDEX('(', n.name) as startParen
, CHARINDEX('known as', n.name) startKnownAs
) x
WHERE 1 = 1
AND n.name LIKE '%known as%'
Then it's easy to apply the above solution in what way you want:
...
WHERE ...
AND startParen < startKnownAs
SELECT DisplayName = case when startParen < startKnownAs THEN LTRIM(RTRIM(SUBSTRING(n.name, startParen + 1, startKnownAs - startParen - 1))) END
...