I'm looking for the number of days within several dateranges. I used the datediff function to sum the days, but now I want to exclude overlapping days. So from the oldest date untill curdate I would like to have the number of days within the dateranges and every day only counted once if it is in an overlapping range.
My table looks like this:
Person_id | Start_date | End_date | Count
83244 2014-09-01 00:00:00 2014-09-06 00:00:00 5
83244 2014-09-08 00:00:00 2015-09-07 00:00:00 364
83244 2015-01-15 00:00:00 2015-02-01 00:00:00 17
If I sum this I would get 382, but the answer I'm looking for is 369. Since the last row is completely overlapping the second row.
Does anyone have a solution?
I have padded out your example with a second Person_id
and shortened the column names a bit to make the code a little shorter:
CREATE TABLE tbl(`pid` int, `sd` datetime, `ed` datetime);
INSERT INTO tbl (`pid`, `sd`, `ed`)
VALUES
(83244, '2014-09-01', '2014-09-06'),
(83244, '2014-09-08', '2015-09-07'),
(83243, '2014-08-08', '2015-08-15'),
(83243, '2014-08-11', '2015-09-03'),
(83244, '2015-01-15', '2015-02-01');
So, working on the above data you could apply the following query:
SELECT pid,sd,ed,CASE WHEN @id!=pid THEN @id:=pid+0*(@ed:=Date('1970-1-1')) END id,
CASE WHEN sd<@ed THEN CASE WHEN ed>@ed THEN datediff(ed,@ed) ELSE 0 END
ELSE datediff(ed,sd) END days,
@ed:=CASE WHEN ed>@ed THEN ed ELSE @ed END enddt
FROM tbl,( select @id:=0 ) const
ORDER BY pid,sd
Contrary to other RDBMS MySql has a certain "procedural feel" to it when it comes to select
statements. You can actually use variables (@id
and @ed
) in them that will change their state over time (in this context the order by
clause at the end is extremely important).
The basic idea behind this query is: start with a certain pid
and list the intervals in the order of increasing start dates (sd
). Always remember the maximum value of the end date (ed
) in variable @ed
. Now, with each new interval, check whether there is an overlap with the previous interval, i. e. check whether the current start date sd
is smaller than the previous (maximum) end date (@ed
) and calculate the interval days
accordingly.
The first case
clause is necessary to reset the variables @id
and @ed
whenever the current pid
changes.
Subquery const
just sets the variable @id
in the beginning.
The query yields the following result:
pid sd ed id days enddt
83243 2014-08-08 00:00:00 2015-08-15 00:00:00 83243 372 2015-08-15 00:00:00
83243 2014-08-11 00:00:00 2015-09-03 00:00:00 19 2015-09-03 00:00:00
83244 2014-09-01 00:00:00 2014-09-06 00:00:00 83244 5 2014-09-06 00:00:00
83244 2014-09-08 00:00:00 2015-09-07 00:00:00 364 2015-09-07 00:00:00
83244 2015-01-15 00:00:00 2015-02-01 00:00:00 0 2015-09-07 00:00:00
See here for a Demo.
If you are just interested in the total sums you can of course wrap the whole query in another group
ing one like this:
SELECT pid,sum(days) FROM (
SELECT pid,sd,ed,CASE WHEN @id!=pid THEN @id:=pid+0*(@ed:=Date('1970-1-1')) END id,
CASE WHEN sd<@ed THEN CASE WHEN ed>@ed THEN datediff(ed,@ed) ELSE 0 END
ELSE datediff(ed,sd) END days,
@ed:=CASE WHEN ed>@ed THEN ed ELSE @ed END enddt
FROM tbl,( select @id:=0 ) const
ORDER BY pid,sd
) t GROUP BY pid ORDER BY pid
which will then get you
pid sum(days)
83243 391
83244 369