So my SQL/BiqQuery skills still need some refining and I have been stuck on this for some time now. I got a table ST that marks the planned mods to some applications with their starting date, these mods only apply to those with some specific specs:
Mod_ID | App_ID | Specs | Start_Date |
---|---|---|---|
P1 | App1 | S1 | 2022-01-24 |
P1 | App2 | S2 | 2022-01-24 |
P2 | App2 | S3 S4 | 2022-03-15 |
P3 | App3 | S4 S5 | 2022-04-10 |
I got another table AQ that includes the production date for some devices, with their corresponding app and their specifications in the form of a string:
Device_ID | App_ID | Spec_string | Prod_Date |
---|---|---|---|
ID0001 | App1 | S1 S4 S6 T0 U7 | 2022-02-03 |
ID0002 | App2 | S2 S5 U9 | 2022-02-05 |
ID0003 | App2 | S1 S2 S3 | 2022-03-12 |
ID0004 | App3 | S5 S6 T0 U7 | 2022-04-18 |
I want to count the occurrences of the different spec mods for a particuar device that apply to its App_ID and only for those that AQ.Prod_Date>=ST.Start_Date.
I first unnested table ST (ST_UNN) and ended with something like this:
Mod_ID | App_ID | Specs_ind | Start_Date |
---|---|---|---|
P1 | App1 | S1 | 2022-01-24 |
P1 | App2 | S2 | 2022-01-24 |
P2 | App2 | S3 | 2022-03-15 |
P2 | App2 | S4 | 2022-03-15 |
P3 | App3 | S4 | 2022-04-10 |
P3 | App3 | S5 | 2022-04-10 |
and tried the following:
SELECT
a.* EXCEPT(Spec_String),
SUM(
CASE
WHEN s.App_ID = a.App_ID AND s.Start_Date <= a.Prod_Date THEN ARRAY_LENGTH(REGEXP_EXTRACT_ALL(Spec_String, Specs_ind))
ELSE
0
END
) OVER(PARTITION BY s.Specs_ind ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS num_changes,
FROM
AQ AS a
LEFT JOIN
ST_UNN AS s
ON
a.App_ID = s.App_ID
But it didn't work as intended. I expect something like this:
Device_ID | App_ID | Spec_string | Prod_Date | num_changes |
---|---|---|---|---|
ID0001 | App1 | S1 S4 S6 T0 U7 | 2022-02-03 | 1 |
ID0002 | App2 | S2 S5 U9 | 2022-02-05 | 1 |
ID0003 | App2 | S1 S3 U7 | 2022-03-12 | 0 |
ID0004 | App3 | S5 S6 T0 U7 | 2022-04-18 | 1 |
But I am rather lost on how to achieve this last part as a cumulative sum, so that a device will count all mod implementations that have happened before its production date and that apply based on its specifications.
Any feedback is greatly appreciated!
If I am understanding this correctly, you can simplify a little and just group by device_id, etc. instead of using a window function. I modified your input slightly just to show that this was working as intended
WITH
st AS (
SELECT
'P1' AS mod_id,
'App1' AS app_id,
'S1 S4' AS specs,
'2022-01-24' AS start_date
UNION ALL
SELECT
'P1',
'App2',
'S2',
'2022-01-24'
UNION ALL
SELECT
'P2',
'App2',
'S3 S4',
'2022-03-15'
UNION ALL
SELECT
'P3',
'App3',
'S4 S5',
'2022-04-10'
UNION ALL
SELECT
'P4',
'App3',
'S4 S5',
'2022-04-11' ),
st_unn AS (
SELECT
* EXCEPT(specs)
FROM
st,
UNNEST(SPLIT(specs, ' ')) AS spec ),
aq AS (
SELECT
-- should be 2
'ID0001' AS device_id,
'App1' AS app_id,
'S1 S4 S6 T0 U7' AS spec_string,
'2022-02-03' AS prod_date
UNION ALL
SELECT
-- should be 1
'ID0002',
'App2',
'S2 S5 U9',
'2022-02-05'
UNION ALL
SELECT
-- should be 0
'ID0003',
'App2',
'S2 S5 U9',
'2021-03-12'
UNION ALL
SELECT
-- should be 3
'ID0004',
'App3',
'S4 S5 S6 T0 U7',
'2022-04-18' )
SELECT
device_id,
aq.app_id,
spec_string,
prod_date,
SUM(COALESCE(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(spec_string, spec)), 0))
FROM
aq
LEFT JOIN
st_unn st
ON
aq.prod_date >= st.start_date
AND aq.app_id = st.app_id
GROUP BY
device_id,
aq.app_id,
spec_string,
prod_date