Search code examples
sql-servertable-variabletable-statisticscardinality-estimation

SQL Server: Cardinality estimation for table variable


SQL Server does not maintain statistics for table variable and it always estimates 1 output row. Microsoft changed this behavior post release of SQL Server 2019. Though I am not able to find much information about those changes.

Below is my sample code. Note, that dbo.Integers just a local user table that contains 100k integers in incremental order.

declare @myTestTable table  (
    ID int not null,
    Symbol varchar(100) not null primary key clustered,
    Price decimal(9,2) null,
    EntityID int not null
);

insert @myTestTable (ID, Symbol, Price, EntityID)
select ID = n.Num,
    Symbol = cast(hashbytes('SHA', cast(n.Num as varchar(30))) as varchar(100)),
    Price = rand(n.Num) * 1000,
    EntityID = case when n.Num < 4000 then 1 else n.Num end
from dbo.Integers n
where n.Num < 5000;

select * from @myTestTable
where EntityID = 1  

Here is the execution plan details.

enter image description here

It is clear from the above stats, that estimated number of rows 70.70 is square root of total number of rows 4999. I changed number of rows but same calculation applied.

As I was not able to find proper documentation on it. Could you please help me understand what changes Microsoft has brought in recent upgrades related to table variable statistics and estimation and if estimated rows are always square root of total number of rows in case of table variable?


Solution

  • Table variables have always had cardinality information and rows in tempdb.sys.partitions showing the correct row count.

    The reason why often historically execution plans "estimate 1 output row" is because SQL Server compiles all statements in a batch before executing. This means that the statements selecting from the table variable are compiled before the statements inserting any rows to them are executed so table cardinality is still zero.

    The new behaviour is just to defer compilation of these statements so they can be compiled just before execution (at which point an accurate row count can be known).

    On previous versions it has always been possible to add an OPTION (RECOMPILE) hint to get a plan compiled that takes this row count into account too.

    Table variables have never had column statistics (and still don't). All SQL Server knows is that the table contains 4999 rows - it has no histogram to refer to that tells it that 3999 of those rows have EntityID = 1 so it needs to resort to guesses as to how many of those will match the predicate.

    With the new cardinality estimator it guesses 70.7036 (POWER(4999e0,.5)) and with the legacy cardinality estimator 594.514 (as heuristic used previously for this case was POWER(4999e0,.75)).

    The query_optimizer_estimate_cardinality extended event shows that the stats are treated as a black box and it just has to guess.

    Field Value
    name query_optimizer_estimate_cardinality
    input_relation <Operator Name="LogOp_Select" ClassNo="32"><StatsCollection Name="CStCollBlackBox" Id="1" Card="4999.00"/><Operator Name="ScaOp_Comp " ClassNo="100"><CompInfo CompareOp="EQ"/><Operator Name="ScaOp_Identifier " ClassNo="99"><IdentifierInfo TableName="" ColumnName="EntityID"/></Operator><Operator Name="ScaOp_Const " ClassNo="98"><ConstInfo Type="int" Value="(1)"/></Operator></Operator></Operator>
    calculator <CalculatorList><FilterCalculator CalculatorName="CSelCalcPointPredsFreqBased"><SubCalculator Role="DistinctCountPlan"><DistinctCountCalculator CalculatorName="CDVCPlanLeaf" Guesses="1"/></SubCalculator></FilterCalculator></CalculatorList>
    stats_collection <StatsCollection Name="CStCollFilter" Id="2" Card="70.70"/>