Search code examples
sqlsql-serverquery-optimizationsqlgeography

Slow performance of Latitude/Longitude conversion in query


I have a performance problem with a SQL query running on SQL Server 2019 that returns the lat/long from a geography column.
My query is as follows and it takes around 5 seconds to return 553 rows:

SELECT ActivityLocations.ID, ActivityLocations.ActivityID, ActivityLocations.Number, ActivityLocations.Location.Lat AS 'Latitude', ActivityLocations.Location.Long AS 'Longitude'
FROM Plans
INNER JOIN Activities ON Plans.ID = Activities.PlanID
INNER JOIN ActivityLocations ON Activities.ID = ActivityLocations.ActivityID
WHERE CustomerID = 35041

The query plan it generates is: Slow Query Plan

But if I change the query a little to return a bit less data, it takes 0 seconds to return 207 rows:

SELECT ActivityLocations.ID, ActivityLocations.ActivityID, ActivityLocations.Number, ActivityLocations.Location.Lat AS 'Latitude', ActivityLocations.Location.Long AS 'Longitude'
FROM Plans
INNER JOIN Activities ON Plans.ID = Activities.PlanID
INNER JOIN ActivityLocations ON Activities.ID = ActivityLocations.ActivityID
WHERE PlanID > 22486

And the query plan is: Fast query Plan

I guess my question, is why is the Compute Scalar operation happening before the join on the slow query and after the join on the fast query? I don't understand why it would do the Lat/Long operation on every row the activity locations table when we only want a small subset of the rows?

Any help would be greatly appreciated.

Edited to contain table information

CREATE TABLE [dbo].[Activities](
[ID] [int] NOT NULL,
[PlanID] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
CONSTRAINT [PK_Activity] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[ActivityLocations](
    [ID] [int] NOT NULL,
    [ActivityID] [int] NOT NULL,
    [Number] [int] NOT NULL,
    [Location] [geography] NOT NULL,
 CONSTRAINT [PK_ActivityLocations] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

CREATE TABLE [dbo].[ActivityPlans](
    [ID] [int] NOT NULL,
    [CustomerID] [int] NOT NULL,
    [PurchaseOrder] [nvarchar](255) NULL,
    [Deleted] [bit] NOT NULL,
    [Name] [nvarchar](500) NULL,
 CONSTRAINT [PK_ActivityPlan] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [PlanID_IX] ON [dbo].[Activities]
(
    [PlanID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ActivityID_IX] ON [dbo].[ActivityLocations]
(
    [ActivityID] ASC
)
INCLUDE([Number],[Location]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [CustomerID_NCIX] ON [dbo].[ActivityPlans]
(
    [CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

Solution

  • NOTE: OP changed the name of the table Plans to ActivityPlans between the originally posted queries/graphical executions plans and his subsequently provided execution plan (via paste the plan) and DDL scripts. I am going to speak in terms of the table Plans (as it was originally posted).

    I am going to try to explain this in its entirety, including trying to note the things that are not an issue between these two queries because the obvious differences when looking at the two queries is not resulting in the issue (per se). Read on to see my explanation.

    Similarities

    First, lets talk about where the two execution plans are similar. It is important to note the similarities because (assuming second query plan is acceptable for the OP) the issues is not with the similarities between the two query plans.

    • Both execution plans start by accessing the tables Plans and Activities.
      • The table accessed first changes based on your WHERE clause. Which is good. The optimizer makes a good decision and is able to use an index seek in both cases. The filter WHERE CustomerID = 35041 is resolved to an index seek on the index CustomerID_NCIX of the Plans table and the filter WHERE PlanID > 22486 is resolved to an index seek on the index PlanID_IX of the Activities table. Then the join to the subsequent table (Activities in the first query and Plans in the second query) is done. It is supported by an index in both cases, the estimates for the seek operations are not horrific, and both joins are done using Nested Loops that output a number that is relatively close to the final result set. So even though the only visual difference in these two queries is the difference in the WHERE clause, it appears the WHERE clause in each query is being handled fairly similarly and does not appear to be an issue.
    • The actual order that all 3 tables used in both the first and second query are accessed in the same order in both execution plans.
    • Both queries access the table ActivityLocations using the index ActivityID_IX.
    • Both queries have a Compute Scalar operator that retrieves the required values for your expressions ActivityLocations.Location.Lat and ActivityLocations.Location.Long from ActivityLocations.Location of your SELECT statement.

    Differences

    Now lets talk about the (important) differences, which is where the issues lie.

    • The first query accesses the table ActivityLocations using an Index Seek operator, while the second query uses an Index Scan operator.
    • The Index Scan operator that accesses the table ActivityLocations of the first query has an Actual/Estimated Row Count of 329,475/331,523 and the Index Seek operator that accesses the table ActivityLocations of the second query has an Actual/Estimated Row Count of 207/9.
    • The first query uses a Merge Join to join the results of the first two tables (Plans and Activities) and the second query uses a Nested Loop Join.
    • The Compute Scalar operator to retrieve the needed values from ActivityLocations.Location of the first query has an Actual/Estimated Row Count of 329,475/331,523 and in the second query it has an Actual/Estimated Row Count of 207/9.
    • The actual/estimated number of rows in the output of the final join of the first query increases from its input Sort operator (471/3341->553/3402) while the actual/estimated number of rows in the output of the final join of the second query remains consistent from its input Nested Loop operator (207->207).

    What is the actual problem?

    Put simply, the first query is reading much more data when we look at the execution plan. The rows read from the table ActivityLocations of roughly 300k in the first query is much higher than the 207 rows read in the second query. Additionally, the Compute Scalar operator of the first query is required to compute values for (the same) roughly 300k rows as opposed to the 207 rows of the second query. This obviously leads to a longer running query.

    It is also worth noting that the larger row counts coming from the table ActivityLocations is the reason for the Merge Join (seen in the first query plan) in place of the Nested Loop Join operator (seen in the second query plan). According to the optmizer, given your environment, a Merge Join is more suitable to join 300k rows to 3.3k rows than a Nested Loop Join would be. And the use of a Merge Join requires both sides of the join to be sorted by the join columns, thus the reason for the additional Sort operator in the query plan of the first query.

    Why is it happening?

    Estimates. Estimates drive the optimizer's decision making. In the first query we see that the estimated number of rows to read from the table ActivityLocations (from the Index Scan) is 331,523 and in the second query (from the Index Seek) we see an estimate of 9. It might seem odd to say, but these estimates are closer than you might think. An Index Scan (on up to date statistics) is usually going to have row estimates equivalent to the rows in the table (except for filtered indexes). An Index Seek would ideally estimate a lower number of rows than what is contained in the table. Ideally that number would match the actual number of rows that the Index Seek is required to touch, but the fact that your Index Seek estimate is lower than the entire table is a step in the right direction.

    So if the issue is not with the estimates in the Index Scan or Index Seek then where is it? The issue is in the choice to access the table ActivityLocations using an Index Scan in the first query over the choice to use an Index Seek. So why does the first query choose an Index Scan? It is quite clear by looking at the execution plan that an Index Seek would have been a better option. I believe the answer in this case is the cardinality estimation, specifically in this case, the cardinality estimation used for the join to the table ActivityLocations.

    We see that the estimated number of rows in the output of the final join of the first query increases from its input Sort operator (3341->3402) while the estimated number of rows in the output of the final join of the second query remains consistent from its input Nested Loop operator (207->207). And not only did the optimzer estimate this, it was right. The actual rows counts returned from these same operators reflect the same pattern.

    Why does that matter? What it means is that based on the optimizer's estimate, the join to the table ActivityLocations is going to increase the row count of the input result set. Meaning this join is going to be 1 (input row) to many (output row). Keep in mind, the optimizer needs to return your requested values ActivityLocations.Location.Lat and ActivityLocations.Location.Long from the table ActivityLocations. So when it considers this join, that it believes is going to increase the rows it plans to output from accessing the table ActivityLocations while keeping in mind that it needs to perform a Compute Scalar on columns output from that table, it would make sense to run the Compute Scalar prior to running the join because if the Compute Scalar is run before the join it can guarantee that the Compute Scalar is only running once per row of ActivityLocations, but it cannot guarantee that if the Compute Scalar is run after the join. In this case, the join is actually what ends up limiting the rows from AcitivityLocations and the number of rows returned from that table (for the purposes of this query) is much lower than the row count of the table. In the second query, the estimate says that the output number of rows will be the same, so running the Compute Scalar after the join makes no difference to the number of rows where the compute will be required, so it makes sense to perform the Index Seek in place of the Index Scan.

    So to summarize, the rows returned from the first two tables of your first and second query (using your WHERE clause) are different. And it is likely that the rows returned from the first query resulted in a join estimation that estimated a different cardinality than the second query. Thus the differences in how the query plan was built and how it was subsequently run.

    Cardinality estimation (with joins specifically) is made up of several factors. If you really want to get in to the internals I would recommend these two articles from the legend Paul White and SQL Shack. The things discussed there should guide you on how you can review the estimation within your system.

    How to fix it?

    The first goal would be to improve the estimate. If the estimated cardinality for the join was incorrect (which is actually not the case here) then updating the statistics might help. Out of date statistics could lead to a bad cardinality estimate, and thus a bad query plan.

    In some cases you might be able to restructure your query to a logical equivalent to result in a better execution plan. This might be by writing it to produce better estimates or even possibly returning different rows in different orders. In this case, I would say the first query looks fine to me, and thus rewriting the query for a logical equivalent would likely not help.

    In this case, the cardinality estimate is correct (and you mentioned that you updated statistics and it did not help), the query appears to be written in a good fashion, but the execution plan chosen is still sub-optmial. So I would recommend a query hint. This problem would easily be solved with a query hint to seek the index ActivityID_IX of ActivityLocations. Your join in your first query would look like so:

    INNER JOIN ActivityLocations (WITH FORCESEEK,INDEX(ActivityID_IX)) ON Activities.ID = ActivityLocations.ActivityID
    

    There is plenty of information out there on why query hints might be a bad idea, but given the information I have here I would say that is the best option. I am always open to other opinion. Cheers!