python postgresql datetime sqlalchemy availability

Date intersection and space availability

I am currently trying to check availability on a "space" within a date range, while this date range can be indefinitely long. The tables are as follows:

Space:

id  available_spaces    name
1   20                  Space 1
2   40                  Space 2
3   10                  Space 3

Booking (end_date can be null, this meaning endless booking):

id  space_id    start_date  end_date    spaces
1   1           13/12-2017  null        9
1   1           13/12-2017  13/12-2018  10

Then i would like to be able to make a search, an example could be:

from:   11/12-2016
to:     null (again meaning endless)
spaces: 2

This query should return the spaces: Space 2, Space 3, since those both have enough availability for that time interval.

And by changing the amount of spaces needed in the search to 1 instead of 2 should yield the following: Search:

from:   11/12-2016
to:     null (again meaning endless)
spaces: 1

Space 1, Space 2, Space 3. The thing i find hard to solve is the variable amounts of spaces that can be available from month to month, and the ability to have endless bookings.

Solution

Revisited

As always, SQL offers multiple ways to solve a given task. The original proposed solution (below) used a self-join, but another way would be to take advantage of window functions. The idea is to increment the used up space every time a new booking starts, and to decrement when it ends:

with bs as (
    select space_id as _sid
         , unnest(array[start_date,
                        coalesce(end_date, date 'infinity')]) as _d
         , unnest(array[spaces, -spaces]) as _sp
    from booking
    where end_date is null or end_date >= '2016-12-11'),
cs as (
    select _sid
        -- The inner sum collapses starting and ending bookings on the same
        -- date to a single spaces value, the outer is the running sum. This
        -- avoids the problem where the order of bookings starting or ending
        -- on the same date is unspecified and would produce possibly falsely
        -- high values for spaces, if all starting bookings just so happen to
        -- come first.
         , sum(sum(_sp)) over (partition by _sid
                               order by _d) as _usp
    from bs
    group by _sid, _d)
select *
from space
where not exists (
    select from cs
    where cs._sid = space.id
      and space.available_spaces - cs._usp < 2)

The same in Python/SQLAlchemy:

from sqlalchemy import or_
from sqlalchemy.dialects.postgresql import array

bs = session.query(
        Booking.space_id,
        func.unnest(array([
            Booking.start_date,
            func.coalesce(Booking.end_date, func.date('infinity'))
        ])).label('date'),
        func.unnest(array([Booking.spaces, -Booking.spaces])).label('spaces')).\
    filter(or_(Booking.end_date == None,
               Booking.end_date >= '2016-12-11')).\
    cte()

cs = session.query(bs.c.space_id,
                   func.sum(func.sum(bs.c.spaces)).over(
                       partition_by=bs.c.space_id,
                       order_by=bs.c.date).label('spaces')).\
    group_by(bs.c.space_id, bs.c.date).\
    cte()

query = session.query(Space).\
    filter(~session.query(cs).
           filter(cs.c.space_id == Space.id,
                  Space.available_spaces - cs.c.spaces < 2).
           exists())

It is easier to explain the workings of the query using SQL first, and then to build up the SQLAlchemy. I'll assume that a booking and a search will always have a start, or in other words can only be unbounded in the end. Using range types and operators, you should start by finding the bookings that overlap your search.

select *
from booking
where daterange(start_date, end_date, '[)')
   && daterange('2016-12-11', null, '[)');

From found bookings you need to find intersections and sum used space. To find the intersections use the start of a booking and look up bookings that contain it. Repeat for all bookings at hand. For example:

|-------| 5
.  .  .
.  |-------------| 2
.  .  .
.  .  |-------------------- 3
.  .  .              .
.  .  .              |---| 1
.  .  .              .
5  7  10             4

and in query form:

with bs as (
    select *
    from booking
    where daterange(start_date, end_date, '[)')
       && daterange('2016-12-11', null, '[)')
)
select distinct
       b1.space_id,
       sum(b2.spaces) as sum
from bs b1
join bs b2
  on b1.start_date <@ daterange(b2.start_date, b2.end_date, '[)')
 and b1.space_id = b2.space_id
group by b1.id, b1.space_id;

which given your example data results in

 space_id | sum 
----------+-----
        1 |  19
(1 row)

because there are 2 bookings only and they have the same start date. The query is far from optimal and for each range has to scan through all the ranges, so O(n^2) at least. In a procedural setting you'd use an interval tree or such for lookups and maybe with some suitable indexing and changes could improve the SQL as well.

With the intersecting booking sums you can then check that there doesn't exist a sum that leaves less space than required by the search:

with bs as (
        select *
        from booking
        where daterange(start_date, end_date, '[)')
           && daterange('2016-12-11', null, '[)')
), cs as (
        select distinct
               b1.space_id,
               sum(b2.spaces) as sum
        from bs b1
        join bs b2
          on b1.start_date <@ daterange(b2.start_date, b2.end_date, '[)')
         and b1.space_id = b2.space_id
        -- Could also use distinct & sum() over (partition by b1.id) instead
        group by b1.id, b1.space_id
)
select *
from space
where not exists(
        select 1
        from cs
        where cs.space_id = space.id
              -- Check if there is not enough space
          and space.available_spaces - cs.sum < 2
);

From this it is straightforward to form the SQLAlchemy version:

from functools import partial
from sqlalchemy.dialects.postgresql import DATERANGE

# Hack. Proper type for passing daterange values is
# psycopg2.extras.DateRange, but that does not have
# the comparator methods.
daterange = partial(func.daterange, type_=DATERANGE)

bs = session.query(Booking).\
    filter(daterange(Booking.start_date, Booking.end_date, '[)').
           overlaps(daterange('2016-12-11', None, '[)'))).\
    cte()

bs1 = bs.alias()
bs2 = bs.alias()

cs = session.query(bs1.c.space_id,
                   func.sum(bs2.c.spaces).label('sum')).\
    distinct().\
    join(bs2, (bs2.c.space_id == bs1.c.space_id) &
              daterange(bs2.c.start_date,
                        bs2.c.end_date).contains(bs1.c.start_date)).\
    group_by(bs1.c.id, bs1.c.space_id).\
    cte()

query = session.query(Space).\
    filter(~session.query(cs).
           filter(cs.c.space_id == Space.id,
                  Space.available_spaces - cs.c.sum < 2).
           exists())