I'm attempting to improve query performance for an application and I'm logically stuck.
So the application is proprietary and thus we're unable to alter application-side code. We have, however, received permission to work with the underlying database (surprisingly enough). The application calls a SQL Server database, so the current idea we're running with is to create a view with the same name as the table and rename the underlying table. When the application hits the view, the view calls one of two SQL CLR functions, which both do nothing more than call a web service we've put together. The web service performs all the logic, and contains an API call to an external, proprietary API that performs some additional logic and then returns the result.
This all works, however, we're having serious performance issues when scaling up to large data sets (100,000+ rows). The pretty clear source of this is the fact we're having to work on one row at a time with the web service, which includes the API call, which makes for a lot of latency overhead.
The obvious solution to this is to figure out a way to limit the number of times that the web service has to be hit per query, but this is where I'm stuck. I've read about a few different ways out there for potentially handling scenarios like this, but as a total database novice I'm having difficulty getting a grasp on what would be appropriate in this situation.
If any there are any ideas/recommendations out there, I'd be very appreciative.
There are probably a few things to look at here:
Is your SQLCLR TVF streaming the results out (i.e. are you adding to a collection and then returning that collection at the end, or are you releasing each row as it is completed -- either with yield return
or building out a full Enumerator)? If not streaming, then you should do this as it allows for the rows to be consumed immediately instead of waiting for the entire process to finish.
Since you are replacing a Table with a View that is sourced by a TVF, you are naturally going to have performance degradation since TVFs:
WHERE
and JOIN
conditions.
Because of these two things, the Query Optimizer is not going to have an easy time generating an appropriate plan if the actual number of rows is 100k.
How many SELECT
s, etc are hitting this View concurrently? Since the View is hitting the same URI each time, you are bound by the concurrent connection limit imposed by ServicePointManager ( ServicePointManager.DefaultConnectionLimit ). And the default limit is a whopping 2
! Meaning, all additional requests to that URI, while there are already 2 active/open HttpWebRequest
s, will wait inline, patiently. You can increase this by setting the .ServicePoint.ConnectionLimit
property of the HttpWebRequest
object.
How often does the underlying data change? Since you switched to a View, that doesn't take any parameters, so you are always returning everything. This opens the door for doing some caching, and there are two options (at least):
For more info on working with SQLCLR in general, please visit: SQLCLR Info