First of all, sorry for bad title - I can't figure out how to write generalized formulation of my problem. I have a table in PostgreSQL with users and dates of their internships. It looks like this:
user_id | start | end |
---|---|---|
1 | December 22, 2019 | June 29, 2020 |
2 | March 8, 2020 | September 8, 2020 |
3 | May 21, 2020 | November 21, 2020 |
From this I need to calculate for each month, how many people were on internship during this month. I only need to calculate full months (if internship actually started on December 22, 2019, I will calculate from January 2022. If internship were finished at June 29, 2020, I will calculate till May 2020. Finally I need this table:
Month | Count |
---|---|
Jan-20 | 1 |
Feb-20 | 1 |
Mar-20 | 1 |
Apr-20 | 2 |
May-20 | 2 |
Jun-20 | 2 |
Jul-20 | 2 |
Aug-20 | 2 |
Sep-20 | 1 |
Oct-20 | 1 |
For making it absolutely clear, this is how I got it:
Month | user_1 | user_2 | user_3 | Count |
---|---|---|---|---|
Jan-20 | 1 | 1 | ||
Feb-20 | 1 | 1 | ||
Mar-20 | 1 | 1 | ||
Apr-20 | 1 | 1 | 2 | |
May-20 | 1 | 1 | 2 | |
Jun-20 | 1 | 1 | 2 | |
Jul-20 | 1 | 1 | 2 | |
Aug-20 | 1 | 1 | 2 | |
Sep-20 | 1 | 1 | ||
Oct-20 | 1 | 1 |
My idea is to:
user_id | date | event |
---|---|---|
1 | December 22, 2019 | start |
1 | June 29, 2020 | end |
2 | March 8, 2020 | start |
2 | September 8, 2020 | end |
3 | May 21, 2020 | start |
3 | November 21, 2020 | end |
user_id | month |
---|---|
1 | Jan-20 |
1 | Feb-20 |
1 | Mar-20 |
1 | Apr-20 |
1 | May-20 |
2 | Apr-20 |
2 | May-20 |
2 | Jun-20 |
2 | Jul-20 |
2 | Aug-20 |
3 | Jun-20 |
3 | Jul-20 |
3 | Aug-20 |
3 | Sep-20 |
3 | Oct-20 |
Unfortunately, I have problems with 1 and 2 clauses.
I don't know how to reshape the table in PostgreSQL. In Pandas I would use 'stack' function. For my case I can't find the appropriate function.
Even if I can reshape it, I don't understand how to make series of month for each user (shown above).
Please advise, what can be done here to solve my problem?
this query should do the job considering your table as test
:
SELECT to_char(d.date, 'Mon-YY') AS month, count(*) AS count
FROM
( SELECT generate_series(date_trunc('month', min(start_date)), date_trunc('month', max(end_date)), interval '1 month') :: date AS date
FROM test
) AS d
INNER JOIN test AS t
ON daterange(t.start_date, t.end_date, '[]') && daterange(d.date, (d.date + interval '1 month') :: date)
WHERE daterange(t.start_date, t.end_date, '[]') @> daterange(d.date, (d.date + interval '1 month') :: date)
GROUP BY d.date
test
.JOIN
clause calculates the intersections between the months and the date interval for each userWHERE
clause filters the rows where the date interval for a user corresponds to a full months.Result :
month | count |
---|---|
Jan-20 | 1 |
Feb-20 | 1 |
Mar-20 | 1 |
Apr-20 | 2 |
May-20 | 2 |
Jun-20 | 2 |
Jul-20 | 2 |
Aug-20 | 2 |
Sep-20 | 1 |
Oct-20 | 1 |
see results in dbfiddle