Search code examples
sql-servermaxnvarchar

Why is a Variable declared as NVARCHAR(MAX) dropping chunks of the string?


For whatever reason, a query is being built as a string and passed off to be executed by another stored procedure.

The query is massive.

Over a thousand lines, and we've run into an issue that requires me to debug it.

The query is being built into a declared NVARCHAR(MAX) variable, but something odd is happening when I print it off using the following -

WHILE @Printed < @ToPrint BEGIN 
    PRINT(SUBSTRING(
        @sql, @Printed, 4000))
    SET @Printed = @Printed + 4000
    PRINT('Printed: ' + CONVERT(VARCHAR, @Printed))
END

At a certain place in the printed message, it's just... dropping a chunk, and I don't understand why. NVARCHAR(MAX) should be able to hold War and Peace over 100 times, and this query is NOT War and Peace.

I know PRINT(...) has a limitation of only being able to print off 4000 characters at a time (hence the loop), but that doesn't explain why the @sql variable is just losing a chunk in places.

If it helps, specifically, the place where the chunk is dropping is about 1,600 characters after the first 4,000 characters are printed.

Why is it doing this? Am I missing setting a system variable at the start of the query (like NOCOUNT or ARITHABORT? I don't even know what those do, or if they're even involved.


EDIT : MCVE : Here. To reproduce, copy-paste into Microsoft SQL Server Management Studio and hit 'F5'. The message printed will not include @sql in its entirety.


Solution

  • This is working fine for me:

    DECLARE @sql nvarchar(max) = 
        REPLICATE(CONVERT(nvarchar(max), N'a'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'b'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'c'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'd'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'e'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'f'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'g'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'h'), 4000)
      + REPLICATE(CONVERT(nvarchar(max), N'i'), 4000);
    
    
    PRINT LEN(@sql);  -- characters
    PRINT DATALENGTH(@sql); -- bytes
    PRINT '';
    
    DECLARE @Printed int = 1, @ToPrint int = LEN(@sql);
    
    WHILE @Printed < @ToPrint BEGIN 
        PRINT(SUBSTRING(
            @sql, @Printed, 4000))
        SET @Printed = @Printed + 4000
        PRINT('Printed: ' + CONVERT(varchar(11), @Printed)) -- *
    END
    

    * Always specify length.

    Output is:

    36000
    72000
    
    aaaaaaaaaa... 4000 As ...aaa
    Printed: 4001
    bbbbbbbbbb... 4000 Bs ...bbb
    Printed: 8001
    cccccccccc... 4000 Cs ...ccc
    Printed: 12001
    dddddddddd... 4000 Ds ...ddd
    Printed: 16001
    eeeeeeeeee... 4000 Es ...eee
    Printed: 20001
    ffffffffff... 4000 Cs ...fff
    Printed: 24001
    gggggggggg... 4000 As ...ggg
    Printed: 28001
    hhhhhhhhhh... 4000 Bs ...hhh
    Printed: 32001
    iiiiiiiiii... 4000 Cs ...iii
    Printed: 36001
    

    So, I think the problem is elsewhere. In any case, this is a really sloppy way to validate the contents of dynamic SQL. Instead I would do:

    SELECT CONVERT(xml, @sql);
    

    Then you can click on the output cell and it opens in an XML text editor for review (you can then copy and paste that output into a query window if you want IntelliSense or any chance in executing, but you'll have to replace encoded characters like &gt; --> >. I talk about this approach (and another one) here:

    If you insist on doing it this bricklaying way, perhaps there is some kind of non-printing or string-termination character that's at that point. If you say it is around character 5,600 then you could do:

    DECLARE @i int = 5550, @c nchar(1);
    WHILE @i <= 5650
    BEGIN
      PRINT '';
      SET @c = SUBSTRING(@sql, @i, 1);
      PRINT '------   ' + RTRIM(@i) + '------:';
      PRINT 'Raw:     ' + @c;
      PRINT 'ASCII:   ' + ASCII(@c);
      PRINT 'UNICODE: ' + UNICODE(@c);
      SET @i += 1;
    END
    

    You should be able to scan down and match the last sequence of characters you see in the broken print output. Then look for anything where the Raw: line is empty and the ASCII: line is anything other than typical (9, 10, 13, 32).

    But I don't think this is the problem. I'll go back to an earlier comment where I suggested that the string itself is the problem. In the question, you mention @sql, but don't show how it's populated. I would bet that some string you're adding to that is getting truncated. Some things to look out for:

    • Intermediate variables/parameters declared as varchar/nvarchar but with no length (which sometimes leads to silent truncation at 1 character, and sometimes 30):

        DECLARE @sql nvarchar(max) = N'SELECT * FROM dbo.table ';
        DECLARE @where nvarchar = N'WHERE some condition...';
        SET @sql += @where;
        PRINT @sql;
      

      Output:

        SELECT * FROM dbo.table W
      
    • Intermediate variables/parameters declared as varchar/nvarchar but too short (which leads to silent truncation at whatever the declaration is):

        DECLARE @sql nvarchar(max) = N'SELECT * FROM dbo.table ';
        DECLARE @where nvarchar(10) = N'WHERE some condition...';
        SET @sql += @where;
        PRINT @sql;
      

      Output:

        SELECT * FROM dbo.table WHERE some
      
    • Explicit CONCAT with NULL, which leads to silently dropping any NULL input):

        DECLARE @sql nvarchar(max) = N'SELECT * FROM dbo.table ';
        DECLARE @where nvarchar(32);
        DECLARE @orderby nvarchar(32) = N' ORDER BY col1';
        SET @sql = CONCAT(@sql, @where, @orderby);
        PRINT @sql;
      

      Output:

        SELECT * FROM dbo.table  ORDER BY col1
      
    • Not using the N prefix when concatenating Unicode string literals > 4000 characters (example here):

        DECLARE @sql nvarchar(max) = '';
      
        SET @sql = @sql + '... literally 4001 characters ...';
      

      The output here (as shown in the example) will be truncated at 4,000 characters. However if you define your strings properly, this won't happen:

        DECLARE @sql nvarchar(max) = N'';
      
        SET @sql = @sql + N'... literally 4001 characters ...';
      

    These things can be hard to spot in overly complex dynamic SQL generation, so it's never a bad idea to simplify and try any way you can to divide & conquer the major components in the eventual string. Based on the repro you attempted I would almost certainly guess it is the "variable declared too short" symptom. Safest is to ensure every input to a dynamic SQL string should be declared as nvarchar(max); no real good reason to use anything else except for entity names which are constrained by metadata anyway.