Search code examples
sqlloopsdategoogle-bigquerycreate-table

create date sharded table in Big Query UI with suffix of the form _YYYYMMDD using a loop


I would like to create tables which are shared in Big Query UI with suffix of the form _YYYYMMDD through a loop.

My final tables would then look as

  • tablename_20200801
  • tablename_20200901
  • tablename_20201001
  • ...

Whilst I have found many ways to read in all the sharded tables. I cannot seem to find a way to create these sharded tables using a loop.

DECLARE -- change dates to match accordingly
  v_snapshot_date date;
DECLARE
  end_of_month date;
DECLARE
  final_snapshot_date DEFAULT DATE '2020-10-01';
DECLARE
  initial_snapshot_date date DEFAULT DATE '2020-08-01';

LOOP
SET
  v_snapshot_date = DATE_ADD(v_snapshot_date,INTERVAL 1 month);
SET
  end_of_month = LAST_DAY(v_snapshot_date);
IF
  v_snapshot_date > final_snapshot_date THEN
LEAVE; -- kill
END IF;
-- create some variables, through a temporary table called tablename.
-- this is the table we want to shard with v_snapshot_date and save
EXECUTE IMMEDIATE
  "CREATE OR REPLACE TEMP TABLE tablename (title STRING, publish_date INT64)";

EXECUTE IMMEDIATE -- Add a row with title=Hamlet and with date=1599
  "INSERT INTO tablename (title, publish_date) VALUES('Hamlet', 1599)";

END LOOP;

Solution

  • Try something like this:

    DECLARE -- change dates to match accordingly
      v_snapshot_date date;
    DECLARE
      end_of_month date;
    DECLARE
      final_snapshot_date DEFAULT DATE '2020-10-01';
    DECLARE
      initial_snapshot_date date DEFAULT DATE '2020-08-01';
    DECLARE
      formatted_date STRING;
    SET v_snapshot_date = initial_snapshot_date;
    
    LOOP
    SET
      v_snapshot_date = DATE_ADD(v_snapshot_date,INTERVAL 1 month);
    SET
      formatted_date = FORMAT_DATE("%Y%m%d", v_snapshot_date);
    SET
      end_of_month = LAST_DAY(v_snapshot_date);
    IF
      v_snapshot_date > final_snapshot_date THEN
    LEAVE; -- kill
    END IF;
    -- create some variables, through a temporary table called tablename.
    -- this is the table we want to shard with v_snapshot_date and save
    EXECUTE IMMEDIATE
      FORMAT("CREATE OR REPLACE TABLE `elzagales.so_test.tablename_%s` (title STRING, publish_date INT64)", formatted_date);
    
    EXECUTE IMMEDIATE -- Add a row with title=Hamlet and with date=1599
      FORMAT("INSERT INTO `elzagales.so_test.tablename_%s` (title, publish_date) VALUES('Hamlet', 1599)", formatted_date);
    
    END LOOP;
    

    I've modified your original code with the following:

    • defining of a string to use as the table suffix
    • setting of the v_snapshot_date to the initial_snapshot_date
    • formatting of the DDL and DML statements

    As a note, you may want to set your initial_snapshot_date one date further back because of the incrementing of the value by one month at the start of the loop.

    Alternatively you could use a FOR LOOP which may be simpler:

    DECLARE end_of_month date;
    DECLARE final_snapshot_date DEFAULT DATE '2020-10-01';
    DECLARE initial_snapshot_date date DEFAULT DATE '2020-08-01';
    DECLARE formatted_date STRING;
    
    FOR date_suffix in (
      select * from UNNEST(GENERATE_DATE_ARRAY(initial_snapshot_date, final_snapshot_date, INTERVAL 1 MONTH)) date
    )
    DO
    
      SET formatted_date = FORMAT_DATE("%Y%m%d", date_suffix.date);
      SET end_of_month = LAST_DAY(date_suffix.date);
      IF 
        date_suffix.date > final_snapshot_date THEN
        LEAVE; -- kill
      END IF;
      -- create some variables, through a temporary table called tablename.
      -- this is the table we want to shard with v_snapshot_date and save
      EXECUTE IMMEDIATE
        FORMAT("CREATE OR REPLACE TABLE `project_id.dataset_id.tablename_%s` (title STRING, publish_date INT64)", formatted_date);
    
      EXECUTE IMMEDIATE -- Add a row with title=Hamlet and with date=1599
        FORMAT("INSERT INTO `project_id.dataset_id.tablename_%s` (title, publish_date) VALUES('Hamlet', 1599)", formatted_date);
    
    END FOR;