I am currently trying to check availability on a "space" within a date range, while this date range can be indefinitely long. The tables are as follows:
Space:
id available_spaces name 1 20 Space 1 2 40 Space 2 3 10 Space 3
Booking (end_date can be null, this meaning endless booking):
id space_id start_date end_date spaces 1 1 13/12-2017 null 9 1 1 13/12-2017 13/12-2018 10
Then i would like to be able to make a search, an example could be:
from: 11/12-2016 to: null (again meaning endless) spaces: 2
This query should return the spaces: Space 2, Space 3, since those both have enough availability for that time interval.
And by changing the amount of spaces needed in the search to 1 instead of 2 should yield the following: Search:
from: 11/12-2016 to: null (again meaning endless) spaces: 1
Space 1, Space 2, Space 3. The thing i find hard to solve is the variable amounts of spaces that can be available from month to month, and the ability to have endless bookings.
As always, SQL offers multiple ways to solve a given task. The original proposed solution (below) used a self-join, but another way would be to take advantage of window functions. The idea is to increment the used up space every time a new booking starts, and to decrement when it ends:
with bs as (
select space_id as _sid
, unnest(array[start_date,
coalesce(end_date, date 'infinity')]) as _d
, unnest(array[spaces, -spaces]) as _sp
from booking
where end_date is null or end_date >= '2016-12-11'),
cs as (
select _sid
-- The inner sum collapses starting and ending bookings on the same
-- date to a single spaces value, the outer is the running sum. This
-- avoids the problem where the order of bookings starting or ending
-- on the same date is unspecified and would produce possibly falsely
-- high values for spaces, if all starting bookings just so happen to
-- come first.
, sum(sum(_sp)) over (partition by _sid
order by _d) as _usp
from bs
group by _sid, _d)
select *
from space
where not exists (
select from cs
where cs._sid = space.id
and space.available_spaces - cs._usp < 2)
The same in Python/SQLAlchemy:
from sqlalchemy import or_
from sqlalchemy.dialects.postgresql import array
bs = session.query(
Booking.space_id,
func.unnest(array([
Booking.start_date,
func.coalesce(Booking.end_date, func.date('infinity'))
])).label('date'),
func.unnest(array([Booking.spaces, -Booking.spaces])).label('spaces')).\
filter(or_(Booking.end_date == None,
Booking.end_date >= '2016-12-11')).\
cte()
cs = session.query(bs.c.space_id,
func.sum(func.sum(bs.c.spaces)).over(
partition_by=bs.c.space_id,
order_by=bs.c.date).label('spaces')).\
group_by(bs.c.space_id, bs.c.date).\
cte()
query = session.query(Space).\
filter(~session.query(cs).
filter(cs.c.space_id == Space.id,
Space.available_spaces - cs.c.spaces < 2).
exists())
It is easier to explain the workings of the query using SQL first, and then to build up the SQLAlchemy. I'll assume that a booking and a search will always have a start, or in other words can only be unbounded in the end. Using range types and operators, you should start by finding the bookings that overlap your search.
select *
from booking
where daterange(start_date, end_date, '[)')
&& daterange('2016-12-11', null, '[)');
From found bookings you need to find intersections and sum used space. To find the intersections use the start of a booking and look up bookings that contain it. Repeat for all bookings at hand. For example:
|-------| 5
. . .
. |-------------| 2
. . .
. . |-------------------- 3
. . . .
. . . |---| 1
. . . .
5 7 10 4
and in query form:
with bs as (
select *
from booking
where daterange(start_date, end_date, '[)')
&& daterange('2016-12-11', null, '[)')
)
select distinct
b1.space_id,
sum(b2.spaces) as sum
from bs b1
join bs b2
on b1.start_date <@ daterange(b2.start_date, b2.end_date, '[)')
and b1.space_id = b2.space_id
group by b1.id, b1.space_id;
which given your example data results in
space_id | sum
----------+-----
1 | 19
(1 row)
because there are 2 bookings only and they have the same start date. The query is far from optimal and for each range has to scan through all the ranges, so O(n^2) at least. In a procedural setting you'd use an interval tree or such for lookups and maybe with some suitable indexing and changes could improve the SQL as well.
With the intersecting booking sums you can then check that there doesn't exist a sum that leaves less space than required by the search:
with bs as (
select *
from booking
where daterange(start_date, end_date, '[)')
&& daterange('2016-12-11', null, '[)')
), cs as (
select distinct
b1.space_id,
sum(b2.spaces) as sum
from bs b1
join bs b2
on b1.start_date <@ daterange(b2.start_date, b2.end_date, '[)')
and b1.space_id = b2.space_id
-- Could also use distinct & sum() over (partition by b1.id) instead
group by b1.id, b1.space_id
)
select *
from space
where not exists(
select 1
from cs
where cs.space_id = space.id
-- Check if there is not enough space
and space.available_spaces - cs.sum < 2
);
From this it is straightforward to form the SQLAlchemy version:
from functools import partial
from sqlalchemy.dialects.postgresql import DATERANGE
# Hack. Proper type for passing daterange values is
# psycopg2.extras.DateRange, but that does not have
# the comparator methods.
daterange = partial(func.daterange, type_=DATERANGE)
bs = session.query(Booking).\
filter(daterange(Booking.start_date, Booking.end_date, '[)').
overlaps(daterange('2016-12-11', None, '[)'))).\
cte()
bs1 = bs.alias()
bs2 = bs.alias()
cs = session.query(bs1.c.space_id,
func.sum(bs2.c.spaces).label('sum')).\
distinct().\
join(bs2, (bs2.c.space_id == bs1.c.space_id) &
daterange(bs2.c.start_date,
bs2.c.end_date).contains(bs1.c.start_date)).\
group_by(bs1.c.id, bs1.c.space_id).\
cte()
query = session.query(Space).\
filter(~session.query(cs).
filter(cs.c.space_id == Space.id,
Space.available_spaces - cs.c.sum < 2).
exists())