Search code examples
google-bigqueryanalytics

Trouble querying tables in multiple datasets in bigquery


I am trying to query two bigquery tables from two different datasets to get 2 separate columns. I have both tried union and joins but they are not giving me want I need. Below is the query I tried

with abagrowth as (
SELECT
  session abas,
  term abat,
  COUNT(distinct studentid) AS acount,
  ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS agrowth
FROM
  aba.abaresult
GROUP BY
  1,
  2
ORDER BY
  1,
  2),

bidagrowth as (
SELECT
  session bidas,
  term bidat,
  COUNT(distinct studentid) AS bcount,
  ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS bgrowth
FROM
  bida.bidaresult
GROUP BY
  1,
  2
ORDER BY
  1,
  2)

select abas, agrowth from abagrowth
union all
select bidas, bgrowth from bidagrowth

The dataset is similar to this

name  subject  session      totalscore
-------------------------------------------
jack  maths    2013/2014         70
jane  maths    2013/2014         65
jill  maths    2013/2014         80
jack  maths    2014/2015         72
jack  eng      2014/2015         87
jane  science  2014/2015         67
jill  maths    2014/2015         70
jerry eng      2014/2015         70
jaasp science  2014/2015         85

The table I am trying to get is meant to be in this format or something similar

session    agrowth  bgrowth
2013/2014   null     null
2014/2015   10%       11%
2015/2016   5%        2%

The figures above are assumed for example sake.

Questions

  1. Is this possible with bigquery?

  2. If yes, How can one achieve this?

Thanks


Solution

  • Regarding the dataset. Yes, you can query two datasets. Check out this answer. Basically, you just need to indicate the project (optional), dataset and table you're using.

    For the data you want to get. You can achieve it using a JOIN rather than UNION. JOINING the tables by session will allow you to have one row per session. Then you can choose what columns to include in your SELECT.

    WITH abagrowth AS (
    SELECT
      session,
      term abat,
      COUNT(distinct studentid) AS acount,
      ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS agrowth
    FROM
      aba.abaresult
    GROUP BY
      1,
      2
    ORDER BY
      1,
      2),
    
    bidagrowth AS (
    SELECT
      session,
      term bidat,
      COUNT(distinct studentid) AS bcount,
      ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS bgrowth
    FROM
      bida.bidaresult
    GROUP BY
      1,
      2
    ORDER BY
      1,
      2)
    
    SELECT aba.session, aba.agrowth, bida.bgrowth
       FROM abagrowth aba
       JOIN bidagrowth bida
            ON aba.session = bida.session
    
    

    UNION will stack the results from the two queries.