Search code examples
google-cloud-platformgoogle-bigqueryinformation-schema

Query for listing Datasets and Number of tables in Bigquery


So I'd like make a query that shows all the datasets from a project, and the number of tables in each one. My problem is with the number of tables.

Here is what I'm stuck with :

SELECT
  smt.catalog_name        as `Project`,
  smt.schema_name         as `DataSet`, 
  ( SELECT
      COUNT(*) 
    FROM ***DataSet***.INFORMATION_SCHEMA.TABLES
  )                       as `nbTable`,
  smt.creation_time,
  smt.location
FROM
  INFORMATION_SCHEMA.SCHEMATA smt
ORDER BY DataSet

The view INFORMATION_SCHEMA.SCHEMATA lists all the datasets from the project the query is executed, and the view INFORMATION_SCHEMA.TABLES lists all the tables from a given dataset.

The thing is that the view INFORMATION_SCHEMA.TABLES needs to have the dataset specified like this give the tables informations : dataset.INFORMATION_SCHEMA.TABLES

So what I need is to replace the *** DataSet*** by the one I got from the query itself (smt.schema_name). I am not sure if I can do it with a sub query, but I don't really know how to manage to do it.

I hope I'm clear enough, thanks in advance if you can help.


Solution

  • You can do this using some procedural language as follows:

    CREATE TEMP TABLE table_counts (dataset_id STRING, table_count INT64);
    
    FOR record IN 
      (
        SELECT 
          catalog_name as project_id,
          schema_name as dataset_id
        FROM `elzagales.INFORMATION_SCHEMA.SCHEMATA`
      )
    
    DO 
      EXECUTE IMMEDIATE 
        CONCAT("INSERT table_counts (dataset_id, table_count)  SELECT table_schema as dataset_id, count(table_name) from ", record.dataset_id,".INFORMATION_SCHEMA.TABLES GROUP BY dataset_id");
    
    END FOR;
    
    SELECT * FROM table_counts;
    
    

    This will return something like: enter image description here