Search code examples
sql-servert-sqlvarcharmax

Efficiency of varchar(max) in T-SQL code


I have a question about the "efficiency" (speed, memory usage) of (n)varchar(max) versus (n)varchar(<explicit-size>) in T-SQL code (stored procedures, functions). I am not asking about its usage in column definitions.

In existing code, I am noticing lots of occasions along the lines of, say:

CREATE TABLE [table] (
    [column] nvarchar(1000)
)

DECLARE @var nvarchar(2000)
SELECT @var = [column] FROM  TABLE WHERE ...
SET @var = @var + @somethingelse + @anotherthing ...
SET @var = REPLACE(@var, N'search', N'lotstoreplacewith')
...

The gist being (the above is only an example) I am ultimately building longer strings off limited-size string columns. It is difficult/inconvenient to know and maintain just how big these strings could become.

So, it would be simpler for me if I changed to declaring these T-SQL variables as nvarchar(max). Then I do not have to worry about what maximum size they could reach. They would be similar to a programming language's "unlimited" string size.

Does anybody know how "efficiently" SQL Server handles code which manipulates variables declared (n)varchar(max)? For example, does it reserve a huge amount of space for such a variable (I'm hoping it handles it purely dynamically), or for CAST/CONVERT(), or do string functions using it suffer some performance penalty? [If it is relevant, I have to support SQL Server 2008 R2 onward.]

[EDIT: It has been suggested that my question is a duplicate of Are there any disadvantages to always using nvarchar(MAX)?. However, all but a single post there refer to varchar(max) in column definitions, and my question explicitly states I am asking about T-SQL code (variables etc.) performance. I am posting an answer (because it is large) to my question below, which draws on that post and extends it with some new information.]


Solution

  • In Are there any disadvantages to always using nvarchar(MAX)? there is one answer https://stackoverflow.com/a/26120578/489865 which relates to T-SQL variables performance and not column definitions.

    The gist of that post is to run SELECT @var='ABC' queries returning 1,000,000 rows, assigning to variables defined as nvarchar(<size>) versus nvarchar(max).

    Under SQL Server 2008 R2, I concur with the poster's findings that nvarchar(max) is 4 times slower than nvarchar(<size>) in the example. Interestingly, if it is changed to make the assignment do slightly more work as in:

    SET NOCOUNT ON;
    
    --===== Test Variable Assignment 1,000,000 times using NVARCHAR(300)
    DECLARE @SomeString NVARCHAR(300),
            @StartTime DATETIME
    ;
     SELECT @startTime = GETDATE()
    ;
     SELECT TOP 1000000
            @SomeString = 'ABC' + ac1.[name] + ac2.[name]
       FROM master.sys.all_columns ac1,
            master.sys.all_columns ac2
    ;
     SELECT Duration    = DATEDIFF(ms,@StartTime,GETDATE())
    ;
    GO
    --===== Test Variable Assignment 1,000,000 times using NVARCHAR(4000)
    DECLARE @SomeString NVARCHAR(4000),
            @StartTime DATETIME
    ;
     SELECT @startTime = GETDATE()
    ;
     SELECT TOP 1000000
            @SomeString = 'ABC' + ac1.[name] + ac2.[name]
       FROM master.sys.all_columns ac1,
            master.sys.all_columns ac2
    ;
     SELECT Duration    = DATEDIFF(ms,@StartTime,GETDATE())
    ;
    GO
    --===== Test Variable Assignment 1,000,000 times using VARCHAR(MAX)
    DECLARE @SomeString NVARCHAR(MAX),
            @StartTime DATETIME
    ;
     SELECT @startTime = GETDATE()
    ;
     SELECT TOP 1000000
            @SomeString = 'ABC' + ac1.[name] + ac2.[name]
       FROM master.sys.all_columns ac1,
            master.sys.all_columns ac2
    ;
     SELECT Duration    = DATEDIFF(ms,@StartTime,GETDATE())
    ;
    GO
    

    (note the + ac1.[name] + ac2.[name]) then the nvarchar(max) takes only twice as long. So in practice performance hit for nvarchar(max) may be better than at first seems.