Search code examples
sqlgroup-bygoogle-bigquerysubqueryweek-number

BigQuery: How to find Scaled Average Weekly Sales?


Scenario to Solve: Our ultimate aim is to get Scaled AverageWeeklySales(AWS) with below formulae:

Scaled AWS = (Number of Weeks article has had a record within the store / maximum Number of Weeks in data) * (AWS /  Number of Weeks)

Input 'table_a' Structure:

Store   Article     Date    Week    Year    Sales
xx      xx          xx      xx      xx      xx

Tried Out: Have framed a query till finding AWS[at store level] and got stuck there to proceed furthur. Since I'm new to this, I'm not sure how this Scaled AWS can be achieved through BigQuery SQL. Need Help!

WITH total_weekly_sales AS (
SELECT
  Store,
  Week,
  SUM(Sales) AS TotalWeeklySales
FROM
  table_a
GROUP BY
  Store,
  Week)
SELECT
  Store,
  AVG(TotalWeeklySales) AS AverageWeeklySales
FROM
  total_weekly_sales tws
GROUP BY
  Store

Sample Data:

Store   Article     Date            Week        Year    Sales
11      aa          2019-07-01      202001      2020    4.9
11      bb          2019-07-07      202001      2020    22.5
11      cc          2019-07-08      202002      2020    10.4
12      aa          2019-07-01      202001      2020    5.3
12      bb          2019-07-07      202001      2020    20.2

Comment Queries:

  • AWS - means Average Weekly Sales[to be derived from the given data at store level, as I have mentioned in my 'Tried Out' session in Question, as to find totalweeklysale at store level and getting average of it gives AWS]
  • Number of Weeks - at store level[for each store, how many weeks data present. Eg. For store 11, this variable value will be 2 whereas for Store 12, this variable value will be 1]
  • Maximum Number of Weeks in data - For ex: from our sample data, the value for this would be 2 [Explaination: for each store, have to calculate total number of weeks data present and take the maximum number of weeks across all stores. Here, Store 11 have 2 weeks in total whereas Store 12 has 1 week in total. So, the maximum here would be 2 which is assigned as value for this variable]

Desired Result: Scaled AWS Formula Part_1 Result [Number of Weeks article has had a record within the store / maximum Number of Weeks in data]

Store   part_1_value    
11      1.5 (3/2)
12      1 (2/2)

Scaled AWS Formula Part_2 Result [AWS / Number of Weeks]

Store   part_2_value    
11      18.9 (37.8/2)
12      25.5 (25.5/1)

Final Output to be: [Part_1 * Part_2]

Store   ScaledAverageWeeklySales    
11      28.35 (1.5*18.9)
12      25.5 (1*25.5)

On the whole, our end result should be at Store Level. Thanks in Advance!


Solution

  • After your clarification, I was able to create a query which calculates what you are seeking for.

    I have used the data your provided together with the WITH statement, together with MAX, COUNT aggregation builtin methods in BigQuery.Below is the query,

    WITH unique_articles AS (
    SELECT Store, COUNT(DISTINCT Article) as uniq_art FROM `test-proj-261014.bq_load_codelab.sales_week`
    GROUP BY Store
    ),
    max_weeks_st AS (
    #not possible to use DATE_DIFF cos the format of the date, the number of weeks would be 1 for the sample data
    #so WEEK column will be used instead
    SELECT Store, COUNT(Distinct Week) as max_weeks_st  FROM `test-proj-261014.bq_load_codelab.sales_week` 
    group by Store
    ), 
    #below returns a struct. So in order to access the int64 value name_of_struct.max_weeks_data
    max_weeks_data AS(
    SELECT MAX(a.max_weeks_st) AS max_weeks_data FROM max_weeks_st a
    ),
    sum_sales AS (
    SELECT Store, SUM(sales) as sum_sales  FROM `test-proj-261014.bq_load_codelab.sales_week` 
    GROUP BY Store
    ), 
    final_data AS(
    select a.Store, a.uniq_art, b.max_weeks_st,c.sum_sales, max_weeks_data  FROM  unique_articles a LEFT JOIN sum_sales c USING(Store)
    LEFT JOIN max_weeks_st b USING(Store) CROSS JOIN  max_weeks_data 
    )
    SELECT Store,((uniq_art/max_weeks_data.max_weeks_data)*(sum_sales/ max_weeks_st)) as AWS_result  FROM final_data a
    

    and the output,

    Row Store   AWS_result
    1   11      28.35
    2   12      25.5
    

    Notice that max_weeks_data is a struct. Thus, to access its value it is necessary to follow the syntax: name_of_struct.max_weeks_data. Moreover, notice that, each variable you described is calculate in a temp table, always with the Store ID, which is used to join the tables.