Search code examples
sqlsql-server

SQL Error with Substring Invalid length parameter passed to SUBSTRING


I have the following 5 STRINGs that I am trying to parse for e.g:

1. Person is (Christopher Bowles known as Chris)
2. This is a test entry
3. Identified as Jonathan Sykes known as John)
4. Registered as (Patrick Joseph known as Pat) (Or also Patty)
5. Guy called as Richard Smith known as Rich) (Or also Richie)

And my query is as follows:

SELECT DisplayName = LTRIM(RTRIM(SUBSTRING(n.name, CHARINDEX('(', n.name) + 1, (CHARINDEX('known as', n.name) - CHARINDEX('(', n.name)) - 1)))
FROM #nameTest n
WHERE 1 = 1
AND n.name LIKE '%known as%'

This fails on the 5th string with following error:

Invalid length parameter passed to the LEFT or SUBSTRING function.

This is because the known as comes BEFORE the '('. My question is, how do I handle such situation? Any idea would be appreciated?

To re-create the test data in Microsoft SQL Server 2022:

CREATE TABLE #nameTest
(   
    name    NVARCHAR(1024) NOT NULL,
)

INSERT INTO #nameTest(name)
VALUES ('Person is (Christopher Bowles known as Chris)'),
('This is a test entry'),
('Identified as Jonathan Sykes known as John)'),
('Registered as (Patrick Joseph known as Pat) (Or also Patty)'),
('Guy called as Richard Smith known as Rich) (Or also Richie)')

The output should be:

1. Christopher Bowles
2. Identified as Jonathan Sykes
3. Patrick Joseph

Solution

  • Let's decontruct the problem a bit. For the issues like "invalid length" the best way to debug it is to output the parts that go into SUBSTRING:

    SELECT  CHARINDEX('(', n.name) + 1, CHARINDEX('known as', n.name)
    ,   CHARINDEX('known as', n.name) - CHARINDEX('(', n.name)
    ,   *
    FROM #nameTest n
    WHERE 1 = 1
    AND n.name LIKE '%known as%'
    

    This outputs:

    (column 1) (column 2) (column 3) name
    12 31 20 Person is (Christopher Bowles known as Chris)
    1 30 30 Identified as Jonathan Sykes known as John)
    16 31 16 Registered as (Patrick Joseph known as Pat) (Or also Patty)
    45 29 -15 Guy called as Richard Smith known as Rich) (Or also Richie)

    We see that there's one value which is negative and substring won't like that. There are many ways to mitigate the issue:

    1. Where:
    select ...
    WHERE ...
    and CHARINDEX('known as', n.name) > CHARINDEX('(', n.name)
    
    1. If you don't like where, you can do a case when:
    SELECT DisplayName = case when CHARINDEX('known as', n.name) > CHARINDEX('(', n.name) then LTRIM(RTRIM(SUBSTRING(n.name, CHARINDEX('(', n.name) + 1, (CHARINDEX('known as', n.name) - CHARINDEX('(', n.name)) - 1))) end
    ...
    

    This keeps the row but turns it into NULL if indexes aren't within correct order.

    To simplify this, i usually create "variables" which removes some of the copypaste, using CROSS APPLY:

    SELECT DisplayName = LTRIM(RTRIM(SUBSTRING(n.name, startParen + 1, startKnownAs - startParen - 1)))
    FROM #nameTest n
    CROSS APPLY (
            SELECT  CHARINDEX('(', n.name) as startParen
            ,   CHARINDEX('known as', n.name) startKnownAs
        ) x
    WHERE 1 = 1
    AND n.name LIKE '%known as%'
    

    Then it's easy to apply the above solution in what way you want:

    1. Where:
    ...
    WHERE ...
    AND startParen < startKnownAs
    
    1. CASE:
    SELECT DisplayName = case when startParen < startKnownAs THEN LTRIM(RTRIM(SUBSTRING(n.name, startParen + 1, startKnownAs - startParen - 1))) END
    ...