Search code examples
sqlsql-serversubstringcommon-table-expression

CTE and Substring/CHARINDEX


I thought CTE was kinda like a temporary table where I could reference to from the following query.
I'm confused on the error I am getting ONLY when using the WHERE of the last query.

Certifiedby field has name:date:id:.
I am checking to see if certifiedby is NULL, if so make CERT_DATE=1/1/1970, otherwise extract the date from Certifiedby into CERT_DATE.

If I run the first part of the query up to SELECT * FROM Certified I get the expected results and the CERT_DATE are all correct.

But when I add/execute the WHERE CERT_DATE BETWEEN '2/1/2023' AND '2/22/2023' I get the error

Invalid length parameter passed to the LEFT or SUBSTRING function.

I thought once Certified was created I could reference it's data? From the error it seems it's still trying to extract the date?

Clues? thank you

SQLFiddle Sample

WITH Certified AS (
    select
        certifiedBy,
        CASE WHEN certifiedBy IS NULL  THEN '1/1/1970' ELSE SUBSTRING(certifiedBy, CHARINDEX(':', certifiedBy, 1) + 1, CHARINDEX(':', certifiedBy, 
        CHARINDEX(':', certifiedBy, 1) + 1) - CHARINDEX(':', certifiedBy, 1) - 1)  
        END AS CERT_DATE
    From dbo.PO_Orders 
    WHERE poreference = 'shev'  
    )
SELECT * FROM Certified 
WHERE CERT_DATE BETWEEN '2/1/2023' AND '2/22/2023' 
ORDER BY CERT_DATE

I'm expecting to get back all records with CERT_DATE within the date range.


Solution

  • The problem is the CERT_DATE column in the CTE IS NOT A DATE. It's a STRING that happens to look kinda like a date to people with certain cultural backgrounds... but the computer and database only see the string. Therefore the BETWEEN operator is doing an ordinal comparison between strings, rather than date comparisons.

    There's an important concept to understand about date and datetime values when working with computers:

    Cultural/internationalization issues mean converting to and from strings and date (or numeric) values is far more expensive and error-prone than we generally expect. It's something to avoid as much as possible.

    This has a number of implications:

    1. Date and Datetime values in the server are not stored as strings at all internally; it is not a human-readable format

    2. Using strings to do date operations is generally the slowest and least-reliable option

    3. There are certain preferred formats for date literals in your code that often do not match your own cultural background. On SQL Server you have these three options, and you should not deviate from them:

      yyyy-MM-ddTHH:mm:ss[.fff]
      yyyyMMdd HH:mm:ss[.fff]
      yyyyMMdd

    With that in mind, we can re-write the original code like this to follow better practice and get expected results:

    WITH Certified AS (
        SELECT certifiedBy,
            coalesce(datefromparts(  
               right(SUBSTRING(CertifiedBy, 1, charindex(':', certifiedby, charindex('/', certifiedby))-1),4), -- year  
               REPLACE(SUBSTRING(certifiedBy, CHARINDEX(':', certifiedBy, 1) + 1, 2),'/',''), -- month
               REPLACE(SUBSTRING(certifiedBy, CHARINDEX('/', certifiedBy, 1) + 1, 2),'/','') -- day
            ),'19700101') as CERT_DATE
        FROM dbo.PO_Orders 
        WHERE  poreference = 'shev'  
    )
    SELECT  * 
    FROM Certified 
    WHERE CERT_DATE >= '20230201' AND CERT_DATE < '20230223' 
    ORDER BY CERT_DATE
    

    See it here:

    https://dbfiddle.uk/TUBKuAr6

    Of course, even better if you can update the application to capture this as a real datetime column at INSERT/UPDATE time. That will perform better by multiple orders of magnitude.