Search code examples
mysqlsqldatabasegaps-and-islandsmariadb-10.4

Find availability in multiple calendars with relation using MySQL


I own something that I frequently rent out. It consists of several parts. It can either be rented part by part, or as a whole. If one part is rented out, you wont be able to rent it as a whole.

An example can be that I rent out a car. This car has tires which also are for rent. You can choose to rent the car with the tires («the whole»), or just rent a tire. However you won't be able to rent the car («the whole») if one or more tires are rented out.

I visualize it as a hierarchy.

         Whole
    _______|_______
   |               |
 Part 1           Part 2

I have used one Google Calendar for «the whole» thing and separate calendars for each containing part. This works, but it's tiresome and I want to be able to just send a link to those who's interested - where they can see what is available.

So I made a database (mariadb 10.4) that is plain simple, having two tables:

# tbl: part
| id | parent_id | name   |

The column parent_id simply refrences another row in the same table, here's an example of data.

| 1  | NULL      | Car    |
| 2  | 1         | Tire 1 |
| 3  | 1         | Tire 2 |

Then the following table to store dates when each part is booked (+ example data).

# tbl: booking
| id | part_id | booked_from | booked_until |
--------------------------------------------
| 1  | 1       | 2021-07-31  | 2021-08-03   |
| 2  | 2       | 2021-08-03  | 2021-08-07   |
| 3  | 3       | 2021-08-04  | 2021-08-06   |
| 4  | 3       | 2021-08-09  | 2021-08-10   |

Out from this we know that the car itself is booked from 2021-07-31 - 2021-08-03, but its only bookable from 2021-08-06 since two of the tires are rented out for this period (They can however be rented out at the same time as they're not strictly related). But just until the 2021-08-09 since a tire is booked again.

What I'm looking for is a query to get a list of dates when something is available. From the pars-table I'm able to find out which parts are related and that's not my biggest problem - I guess, since can use something like this when querying availability for:

  • The car: part_id IN(1,2,3)
  • Tire 1: part_id IN(1,2)
  • Tire 3: part_id IN(1,3)

My problem is (simply?) how I can orchestrate a query that returns only the dates when something is available, especially for the car when dates can overlap.

E.g results for the car

SELECT 
  `booked_until` AS `available-from`, 
  `booked_from` as `available-until`
FROM 
  booking
/** some JOIN magic? **
WHERE part_id IN(1,2,3)

E.g for tire 1 would be the same but with part_id IN(1,2) as tire 2 (id: 3) isn't directly related with tire 1.

Both should respectively return:

# car
| available-from | available-until |
------------------------------------
| NULL           | 2021-07-31      |
| 2021-08-06     | 2021-08-09      |
| 2021-08-10     | NULL            |
# tire 1
| available-from | available-until |
------------------------------------
| NULL           | 2021-07-31      |
| 2021-08-07     | NULL            |

Where the NULL-values just says there's nothing booked before or ahead. E.g this tire is available from now until the 2021-07-31 and from 2021-08-07 until the last day earth exists.

Hope this makes sense - and that someone is able to help.

Thank you in advance.


Solution

  • Ok, here's my attempt.

    So, if I understand correctly, additionally to the information given explicitly in the table, there is implicit unavailability for the units. So I first retrieved this explicitly:

    select unit_id, entry_start, entry_end 
      from unit_calendar_entry
    union
    select p.id, u.entry_start, u.entry_end
     from unit p
     join unit_calendar_entry u
       on  p.parent_unit_id = u.unit_id
    union
    select p.parent_unit_id as unit_id, u.entry_start, u.entry_end 
     from unit p
     join unit_calendar_entry u
       on p.id = u.unit_id
      and p.parent_unit_id is not null
    order by unit_id, entry_start;
    
    • the first select just gets the entries already in the table
    • the second one adds entries for the car, since it can be considered booked if either of its tires is booked
    • the third one adds entries for the tires, since they can be considered booked if the car is booked

    Result:

    unit_id     entry_start           entry_end
    ----------------------------------------------------
    1          2021-07-31 00:00:00   2021-08-03 00:00:00
    1          2021-08-03 00:00:00   2021-08-07 00:00:00
    1          2021-08-04 00:00:00   2021-08-06 00:00:00
    1          2021-08-09 00:00:00   2021-08-10 00:00:00
    2          2021-07-31 00:00:00   2021-08-03 00:00:00
    2          2021-08-03 00:00:00   2021-08-07 00:00:00
    3          2021-07-31 00:00:00   2021-08-03 00:00:00
    3          2021-08-04 00:00:00   2021-08-06 00:00:00
    3          2021-08-09 00:00:00   2021-08-10 00:00:00
    

    Based on that, there is a gaps-and-islands problem to solve in order to group adjacent/overlapping timespans. You can use two queries to mark the entries that belong together, like in this SO answer. If we call the above query subtab, the according statement is

    select c.*, sum(case when prev_end < entry_start then 1 else 0 end) over (order by unit_id, entry_start) as grouping   
             from (
                  select subtab.*, max(entry_end) over (partition by unit_id order by entry_start rows between unbounded preceding and 1 preceding) as prev_end 
                    from subtab
                  ) c
    
    • The inner query gets the previous end for each row.
    • The outer one assigns a grouping id that (within each unit_id) identifies all entries belonging to a continuous block (aka an island).

    Result:

    unit_id entry_start          entry_end            prev_end             grouping
    -------------------------------------------------------------------------------
    1       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               0
    1       2021-08-03 00:00:00  2021-08-07 00:00:00  2021-08-03 00:00:00  0
    1       2021-08-04 00:00:00  2021-08-06 00:00:00  2021-08-07 00:00:00  0
    1       2021-08-09 00:00:00  2021-08-10 00:00:00  2021-08-07 00:00:00  1
    2       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               1
    2       2021-08-03 00:00:00  2021-08-07 00:00:00  2021-08-03 00:00:00  1
    3       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               1
    3       2021-08-04 00:00:00  2021-08-06 00:00:00  2021-08-03 00:00:00  2
    3       2021-08-09 00:00:00  2021-08-10 00:00:00  2021-08-06 00:00:00  3
    

    From this (let's call it tab), you can either obtain the unavailable timespans by grouping on unit_id and grouping (see this SO answer or db<>fiddle below) or calculate the free times as follows:

    select distinct unit_id
                  , NULLIF((min(ifnull(prev_end,'1000-01-01')) over (partition by unit_id, grouping)),'1000-01-01') as available_from
                  , min(entry_start) over (partition by unit_id, grouping) as available_til
       from tab
    union 
    select distinct unit_id
                    , max(entry_end) over (partition by unit_id) as available_from
                    , null as available_til
     from tab
    order by unit_id, available_from
    
    • the first query gets available_from as the minimum of the prev_end for each unit_id/grouping. In order to get NULL values from MIN(), I used a workaround similar to this SO answer.
    • the second query adds a row for each unit_id with the maximum entry_end as start and NULL as end

    Result:

    unit_id  available_from         available_til
    ---------------------------------------------------
    1        (null)                 2021-07-31 00:00:00
    1        2021-08-07 00:00:00    2021-08-09 00:00:00
    1        2021-08-10 00:00:00    (null)
    2        (null)                 2021-07-31 00:00:00
    2        2021-08-07 00:00:00    (null)
    3        (null)                 2021-07-31 00:00:00
    3        2021-08-03 00:00:00    2021-08-04 00:00:00
    3        2021-08-06 00:00:00    2021-08-09 00:00:00
    3        2021-08-10 00:00:00    (null)
    

    Putting it all together in one query:

    with tab as (
                select c.*, sum(case when prev_end < entry_start then 1 else 0 end) over (order by unit_id, entry_start) as grouping   
                  from (
                       select d.*, max(entry_end) over (partition by unit_id order by entry_start rows between unbounded preceding and 1 preceding) as prev_end 
                         from (
                          select unit_id, entry_start, entry_end 
                            from unit_calendar_entry
                          union
                          select p.id, u.entry_start, u.entry_end
                           from unit p
                           join unit_calendar_entry u
                             on p.parent_unit_id = u.unit_id
                          union
                          select p.parent_unit_id as unit_id, u.entry_start, u.entry_end 
                           from unit p
                           join unit_calendar_entry u
                             on p.id = u.unit_id
                            and p.parent_unit_id is not null
                              ) d
                       ) c
                ) 
     select distinct unit_id, NULLIF((min(ifnull(prev_end,'1000-01-01')) over (partition by unit_id, grouping)),'1000-01-01') as available_from, min(entry_start) over (partition by unit_id, grouping) as available_til
       from tab
       union 
      select distinct unit_id, max(entry_end) over (partition by unit_id) as available_from, null as available_til
       from tab
     order by unit_id, available_from
    

    See also this db<>fiddle.