I have the following table:
id | timestamp | team
----------------------------
1 | 2016-05-06 | A
2 | 2016-03-02 | A
3 | 2015-12-01 | A
4 | 2016-07-05 | B
5 | 2016-06-30 | B
6 | 2016-06-28 | B
7 | 2016-04-05 | C
8 | 2016-04-02 | C
9 | 2016-01-02 | C
I want to group by team and find the last timestamp for each team, so I did:
select team, max(timestamp) from my_table group by team
It's all working fine so far. However, now I want to find out how many distinct id in the last month of each team. For example, for team A, it would be from 2016-04-07 to 2016-05-06, so such count is 1. For team B, the last month is from 2016-06-06 to 2016-07-05, so the count is 3. And for team C, the last month is from 2016-03-06 to 2016-04-05, and the count is 2. My expected output should look like:
team | max(timestamp) | count_in_last_month
------------------------------------------------
A | 2016-05-06 | 1
B | 2016-07-05 | 3
C | 2016-04-05 | 2
Can this be derived using the Impala query? Thanks!
Join the original table with the subquery that gets the max timestamp.
SELECT t1.team, t2.month_end, COUNT(DISTINCT t1.id) AS count_in_last_month
FROM my_table AS t1
JOIN (SELECT team, MAX(timestamp) AS month_end
FROM my_table
GROUP BY team) AS t2
ON t1.team = t2.team
AND t1.timestamp BETWEEN DATE_SUB(month_end, INTERVAL 1 MONTH) AND month_end
GROUP BY t1.team, t2.month_end