I have cumulative counts for two groups over time in this format:
Date | Group | Cumulative Count |
---|---|---|
1/1/2020 | A | 1 |
1/2/2020 | A | 3 |
1/2/2020 | B | 1 |
1/3/2020 | B | 2 |
And I'd like to reshape this data into this format:
Date | Group | Cumulative Count |
---|---|---|
1/1/2020 | A | 1 |
1/1/2020 | B | 0 |
1/2/2020 | A | 3 |
1/2/2020 | B | 1 |
1/3/2020 | A | 3 |
1/3/2020 | B | 2 |
So that I can get it to display accurately in a stacked area chart in metabase - any advice?
You may generate all possible pairs of dates and groups with a cross join before using a left join to get the combined dataset eg
Since your dataset already has the cumulative counts, the missing values identified by null have been replaced using the most recent cumulative count with MAX
and COALESCE
.
SELECT
d."Date"::text,
d."Group",
COALESCE(m."CumulativeCount",COALESCE(MAX(m."CumulativeCount") OVER (
PARTITION BY d."Group"
ORDER BY d."Date"
),0)) as CumulativeCount
FROM (
SELECT "Date", "Group" FROM (
SELECT DISTINCT
"Date"
FROM
my_data
) t1
CROSS JOIN (
SELECT DISTINCT
"Group"
FROM
my_data
) t2
) d
LEFT JOIN my_data m ON m."Date"=d."Date" AND
m."Group" = d."Group"
ORDER BY 1,2;
Date | Group | cumulativecount |
---|---|---|
2020-01-01 | A | 1 |
2020-01-01 | B | 0 |
2020-01-02 | A | 3 |
2020-01-02 | B | 1 |
2020-01-03 | A | 3 |
2020-01-03 | B | 2 |
View working demo on DB Fiddle
If it is that you would like to generate values for dates between missing dates eg you had the next date to be 1/7/2020
and you wanted to fill the gaps for 1/3/2020
you could use generate_series
to generate the possible dates and MAX
to get the most recent value. I have included a fiddle below with additional sample data eg
Schema (PostgreSQL v13)
CREATE TABLE my_data (
"Date" DATE,
"Group" VARCHAR(1),
"CumulativeCount" INTEGER
);
INSERT INTO my_data
("Date", "Group", "CumulativeCount")
VALUES
('1/1/2020', 'A', '1'),
('1/2/2020', 'A', '3'),
('1/2/2020', 'B', '1'),
('1/3/2020', 'B', '2'),
('1/1/2020', 'C', '2'),
('1/7/2020', 'C', '3');
Query #1
SELECT
d."Date"::text,
d."Group",
COALESCE(
m."CumulativeCount",
COALESCE(MAX(m."CumulativeCount") OVER (
PARTITION BY d."Group"
ORDER BY d."Date"
),0)
) as CumulativeCount
FROM (
SELECT "Date", "Group" FROM (
SELECT
GENERATE_SERIES(
MIN("Date"),
MAX("Date"),
INTERVAL '1' DAY
) as "Date"
FROM
my_data
) t1
CROSS JOIN (
SELECT DISTINCT
"Group"
FROM
my_data
) t2
) d
LEFT JOIN my_data m ON m."Date"=d."Date" AND
m."Group" = d."Group"
ORDER BY 1,2;
Date | Group | cumulativecount |
---|---|---|
2020-01-01 00:00:00+00 | A | 1 |
2020-01-01 00:00:00+00 | B | 0 |
2020-01-01 00:00:00+00 | C | 2 |
2020-01-02 00:00:00+00 | A | 3 |
2020-01-02 00:00:00+00 | B | 1 |
2020-01-02 00:00:00+00 | C | 2 |
2020-01-03 00:00:00+00 | A | 3 |
2020-01-03 00:00:00+00 | B | 2 |
2020-01-03 00:00:00+00 | C | 2 |
2020-01-04 00:00:00+00 | A | 3 |
2020-01-04 00:00:00+00 | B | 2 |
2020-01-04 00:00:00+00 | C | 2 |
2020-01-05 00:00:00+00 | A | 3 |
2020-01-05 00:00:00+00 | B | 2 |
2020-01-05 00:00:00+00 | C | 2 |
2020-01-06 00:00:00+00 | A | 3 |
2020-01-06 00:00:00+00 | B | 2 |
2020-01-06 00:00:00+00 | C | 2 |
2020-01-07 00:00:00+00 | A | 3 |
2020-01-07 00:00:00+00 | B | 2 |
2020-01-07 00:00:00+00 | C | 3 |