How to get a list of tables used for each query in the query history in snowflake

I am doing analysis on table usage. I need to use the query history table and take out all tables that were used in a query. The query itself could use 10 separate tables, or even use the same table multiple times. I need to be able to parse through the query (using a query itself, not python or any other language, just SQL) and take out a list of all tables that the query hit.

Is there any way to do this?

An example of where I am getting the histories would be this query:

select query_text
from table(information_schema.query_history())

Solution

An alternative approach using rlike and information_schema.tables.

You could extend this further by looking at the # rows per table (high = fact, low = dimension) and the number of times accessed.

select query_text, array_agg(DISTINCT TABLE_NAME::string) 
from 
 (select top 100 query_text 
  from 
     table(information_schema.query_history())
  where 
     EXECUTION_STATUS = 'SUCCESS' ) a
 left outer join 
 (select TABLE_NAME from INFORMATION_SCHEMA.TABLES group by TABLE_NAME) b
 on 
    upper(a.query_text) rlike '.*('||upper(b.table_name)||').*'
group by 
   query_text

Extended Version:

I noticed there's some issues with the above answer. Firstly that it does not allow you to run the explain plan any more than one query at a time. Secondly if the query_id uses a cache it fails to return any objects.

So be extending my initial answer as follows.

Create a couple views that read all the databases and provide central authority on all tables/views/objects/query_histories.
Run the generated SQL which creates a couple views. It again uses rlike but substitutes database and schema names from the query_history when not present.

I've added credits, time elapsed to the two views for more extensions.

You can validate for yourself by checking the explain plan as above and if you don't see identical tables check the SQL and you'll most likely see the cache has been used.

Would be great to hear if anyone finds this useful.

Step 1 Create a couple views :

show databases;
select RTRIM( 'create or replace view bob as ( '||listagg('select  CONCAT_WS(\'.\',TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME) table_name_3,CONCAT_WS(\'.\',TABLE_SCHEMA, TABLE_NAME) table_name_2,TABLE_NAME, ROW_COUNT, BYTES from ' ||"name" ||'.INFORMATION_SCHEMA.TABLES union all ') ,' union all')||')' tabs,
RTRIM( 'create or replace view bobby as ( '||listagg('select QUERY_ID, query_text ,DATABASE_NAME, SCHEMA_NAME, CREDITS_USED_CLOUD_SERVICES , TOTAL_ELAPSED_TIME from table( '||"name" ||'.information_schema.query_history())  where  EXECUTION_STATUS = \'SUCCESS\' union all ') ,' union all')||')' tabs2 
from table(result_scan( LAST_QUERY_ID()));

Step 2 Run this SQL:

select
QUERY_TEXT,
query_id,
CREDITS_USED,
TOTAL_ELAPSED,
array_agg(TABLE_NAME_3) tables_used
from
(
    select
        QUERY_TEXT
        ,query_id
        ,TABLE_NAME
        , rlike(  (a.query_text) ,  '.*(\\s.|\\.){1}('||(bob.TABLE_NAME)||'(\\s.*|$))','is')   aa
        , rlike(  (a.query_text) ,  '.*(\\s.|\\.){1}('||(bob.TABLE_NAME_2)||'(\\s.*|$))','is')  bb
        , rlike(  (a.query_text) ,  '.*(\\s.){1}('||(bob.TABLE_NAME_3)||'(\\s.*|$))','is')     cc,
        bob.TABLE_NAME_3,
        count(1) cnt,
        max(CREDITS_USED_CLOUD_SERVICES) CREDITS_USED,
        max(TOTAL_ELAPSED_TIME) TOTAL_ELAPSED
    from
       BOBBY a
   left outer join 
       BOB 
     on 
        rlike(  (a.query_text) ,  '.*(\\s.|\\.){1}('||(bob.TABLE_NAME)||'(\\s.*|$))','is')    
     or rlike(  (a.query_text) ,  '.*(\\s.|\\.){1}('||(bob.TABLE_NAME_2)||'(\\s.*|$))','is')    
     or rlike(  (a.query_text) ,  '.*(\\s.|\\.){1}('||(bob.TABLE_NAME_3)||'(\\s.*|$))','is')    
    where 
        TABLE_NAME is not null 
        and (  cc
             or    iff(bb, upper( DATABASE_NAME||'.'||TABLE_NAME) = bob.TABLE_NAME_3, false)
             or    iff(aa, upper (DATABASE_NAME||'.'||SCHEMA_NAME||'.'||TABLE_NAME) = bob.TABLE_NAME_3, false)   
            )
    group by 
       1,2,3,4,5,6,7) 
 group by 
    1,2,3,4;