sql sql-server metadata sql-execution-plan

Is there a way in SQL to determine all columns accessed by an arbitrary query?

I support a database that contains a schema that has a couple hundred tables containing our most important data.

Our application also offers APIs implemented as queries stored in NVARCHAR(MAX) fields in a Query table which are written against the views as well as the tables in this critical schema.

Over time, columns have been added to the tables, but the APIs haven't always kept up.

I've been asked if I can find a way via SQL to identify, as nearly as possible (some false positives/negatives OK), columns in the tables that are not referenced by either the views or the SQL queries that provide the API output.

Initially this seemed do-able. I've found some similar questions on the topic, such as here and here that sort of give guidance on how to start...although I note that even with these, there's the kind of ugly fallback method that looks like:

OBJECT_DEFINITION(OBJECT_ID([Schema].[View])) LIKE '%' + [Column] + '%'

Which is likely to generate false positives as well as be super slow when I'm trying to do it for a couple of thousand column names.

Isn't there anything better/more reliable? Maybe something that could compile a query down to a plan and be able to determine from the plan every column that must be accessed in order to deliver the results?

Solution

Our application also offers APIs implemented as queries stored in NVARCHAR(MAX) fields

So you've reimplemented views? :)

If you make them actual views you can look at INFORMATION_SCHEMA - cross reference table/columns to view/columns.

Assuming you don't want to do that, and you're prepared to write a job to run occasionally (rather than real-time) you could do some super-cheesy dynamic SQL.

Loop through your definitions that are stored in NVARCHAR(MAX) with a cursor
Create a temp view or SP from the SQL in your NVARCHAR(MAX)
Examine INFORMATION_SCHEMA from your temp view/SP and put that into a temp holding table.
Do this for all your queries then you've got a list of referenced columns

Pretty ugly but should be workable for a tactical scan of your API vs database.