Search code examples
sqlgoogle-bigquerylooker-studio

Filter by MIN(date) over partition | Data Studio


I'm attempting to connect date parameters from BigQuery to Data Studio so I'm adding in some date variables to my query. However, I'm running into some issues filtering on this date.

Here's my query:

  SELECT first_item,
  COUNT(*) AS first_purchases,
  SUM(purchases_within_90_days) AS purchased_within_90_days,
  SUM(purchases_within_180_days) AS purchased_within_180_days,
  SUM(purchases_within_270_days) AS purchased_within_270_days,
  SUM(revenue90days) as total_revenue_90,
  SUM(revenue180days) as total_revenue_180,
  SUM(revenue270days) as total_revenue_270
  FROM (

  SELECT email, first_item, processed_at, 
    SUM(purch_90_days) OVER(PARTITION BY email) AS purchases_within_90_days, SUM(rev_90) OVER(PARTITION BY email) AS revenue90days,
    SUM(purch_180days) OVER(PARTITION BY email) AS purchases_within_180_days, SUM(rev_180) OVER(PARTITION BY email) AS revenue180days,
    SUM(purch_270days) OVER(PARTITION BY email) AS purchases_within_270_days, SUM(rev_270) OVER(PARTITION BY email) AS revenue270days
  FROM (

SELECT email, first_item, processed_at, SUM(purchases_within_90_days) as purch_90_days, SUM(purchases_within_180_days) as purch_180days, SUM(purchases_within_270_days) as purch_270days, SUM(revenue_within_90_days) as rev_90, SUM(revenue_within_180_days) as rev_180, SUM(revenue_within_270_days) as rev_270
FROM (

SELECT   email, processed_at, first_item, MAX(CASE WHEN hours_since_first_purchase < 90 * 24 AND hours_since_first_purchase > 0 THEN 1 ELSE 0 END) AS purchases_within_90_days,
  MAX(CASE WHEN hours_since_first_purchase < 180 * 24 AND hours_since_first_purchase > 0 THEN 1 ELSE 0 END) AS purchases_within_180_days,
  MAX(CASE WHEN hours_since_first_purchase < 270 * 24 AND hours_since_first_purchase > 0 THEN 1 ELSE 0 END) AS purchases_within_270_days,
  SUM(CASE WHEN hours_since_first_purchase < 90 * 24 AND hours_since_first_purchase > 0 THEN price ELSE 0 END) AS revenue_within_90_days,
  SUM(CASE WHEN hours_since_first_purchase < 180 * 24 AND hours_since_first_purchase > 0 THEN price ELSE 0 END) AS revenue_within_180_days,
  SUM(CASE WHEN hours_since_first_purchase < 270 * 24 AND hours_since_first_purchase > 0 THEN price ELSE 0 END) AS revenue_within_270_days,
FROM (
  
 SELECT order_number, email, processed_at, sku, price, hours_since_first_purchase, first_date,
 CASE
   WHEN hours_since_first_purchase = 0 OR hours_since_first_purchase is null then sku
   else null
   end as first_item,
 FROM (

SELECT order_number, customer.id, email, MIN(processed_at) over(partition by email) as first_date, processed_at, title, price,sku,
    CASE
     WHEN ROW_NUMBER() OVER(PARTITION BY customer.id ORDER BY processed_at) = 1 THEN null
      ELSE TIMESTAMP_DIFF(processed_at, FIRST_VALUE(processed_at) OVER(PARTITION BY customer.id ORDER BY processed_at), HOUR)
      END AS hours_since_first_purchase      
FROM (

SELECT * EXCEPT(instance, line_items) FROM (
  SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
  FROM `table.orders`
), UNNEST(line_items) as item
   -- identify duplicate rows
WHERE instance = 1 
)

order by email desc
  )

where first_date >  PARSE_DATE('%Y%m%d', @DS_START_DATE) and first_date < PARSE_DATE('%Y%m%d', @DS_END_DATE);
--where first_date <= '2019-09-28'--and first_date > '2020-06-07'
)
  
group by first_item, email, processed_at
)
    
where email <> ""
group by email, first_item,processed_at
order by processed_at asc
)
    
order by processed_at asc
  )
  where first_item is not null and first_item <> "" and first_item <> "unknown" and first_item not like '%variant%' and first_item not like '%product%' 
  group by first_item

When I attempt to filter on the first_date variable, Data Studio is giving me an error with my query. Is there anything I can do to filter on this new variable I've added?

I'm getting the error" "The query returned an error"

The line of code causing this error is the following:

where first_date >  PARSE_DATE('%Y%m%d', @DS_START_DATE) and first_date < PARSE_DATE('%Y%m%d', @DS_END_DATE)

My query performs perfectly when I switch that line with the following:

where first_date <= '2019-09-28'--and first_date > '2020-06-07'

UPDATE:

This is SO close to working. It works when I have 1 of the filters applied, but when I have the 2nd, it's throwing the same error.

It works when I add this line:

where cast(first_date as date) <=  PARSE_DATE('%Y%m%d', @DS_END_DATE)

But throws that error again when I have this:

where cast(first_date as date) <=  PARSE_DATE('%Y%m%d', @DS_END_DATE) and cast(first_date as date) >=  PARSE_DATE('%Y%m%d', @DS_START_DATE)

Solution

  • Probably your first_date field is not a DATE but a TIMESTAMP To show you this problem I will use a public table (bigquery-public-data.covid19_italy.data_by_region)

    This table, as you can see in the images below, has a TIMESTAMP field named date. For reproducing your problem, I will try to access this table through DataStudio.

    enter image description here

    enter image description here

    In DataStudio, if I try your approach I get an error as you can see below

    1 - Query enter image description here

    2 - Error enter image description here

    However, if I change the query to the query below it works fine as you can see in the images.

    SELECT * FROM `bigquery-public-data.covid19_italy.data_by_region` WHERE cast(date as date) < PARSE_DATE('%Y%m%d',@DS_START_DATE)
    

    1 - Updating query enter image description here

    2 - Dashboard working enter image description here