mysql sql database gaps-and-islands mariadb-10.4

Find availability in multiple calendars with relation using MySQL

I own something that I frequently rent out. It consists of several parts. It can either be rented part by part, or as a whole. If one part is rented out, you wont be able to rent it as a whole.

An example can be that I rent out a car. This car has tires which also are for rent. You can choose to rent the car with the tires («the whole»), or just rent a tire. However you won't be able to rent the car («the whole») if one or more tires are rented out.

I visualize it as a hierarchy.

         Whole
    _______|_______
   |               |
 Part 1           Part 2

I have used one Google Calendar for «the whole» thing and separate calendars for each containing part. This works, but it's tiresome and I want to be able to just send a link to those who's interested - where they can see what is available.

So I made a database (mariadb 10.4) that is plain simple, having two tables:

# tbl: part
| id | parent_id | name   |

The column parent_id simply refrences another row in the same table, here's an example of data.

| 1  | NULL      | Car    |
| 2  | 1         | Tire 1 |
| 3  | 1         | Tire 2 |

Then the following table to store dates when each part is booked (+ example data).

# tbl: booking
| id | part_id | booked_from | booked_until |
--------------------------------------------
| 1  | 1       | 2021-07-31  | 2021-08-03   |
| 2  | 2       | 2021-08-03  | 2021-08-07   |
| 3  | 3       | 2021-08-04  | 2021-08-06   |
| 4  | 3       | 2021-08-09  | 2021-08-10   |

Out from this we know that the car itself is booked from 2021-07-31 - 2021-08-03, but its only bookable from 2021-08-06 since two of the tires are rented out for this period (They can however be rented out at the same time as they're not strictly related). But just until the 2021-08-09 since a tire is booked again.

What I'm looking for is a query to get a list of dates when something is available. From the pars-table I'm able to find out which parts are related and that's not my biggest problem - I guess, since can use something like this when querying availability for:

The car: part_id IN(1,2,3)
Tire 1: part_id IN(1,2)
Tire 3: part_id IN(1,3)

My problem is (simply?) how I can orchestrate a query that returns only the dates when something is available, especially for the car when dates can overlap.

E.g results for the car

SELECT 
  `booked_until` AS `available-from`, 
  `booked_from` as `available-until`
FROM 
  booking
/** some JOIN magic? **
WHERE part_id IN(1,2,3)

E.g for tire 1 would be the same but with part_id IN(1,2) as tire 2 (id: 3) isn't directly related with tire 1.

Both should respectively return:

# car
| available-from | available-until |
------------------------------------
| NULL           | 2021-07-31      |
| 2021-08-06     | 2021-08-09      |
| 2021-08-10     | NULL            |

# tire 1
| available-from | available-until |
------------------------------------
| NULL           | 2021-07-31      |
| 2021-08-07     | NULL            |

Where the NULL-values just says there's nothing booked before or ahead. E.g this tire is available from now until the 2021-07-31 and from 2021-08-07 until the last day earth exists.

Hope this makes sense - and that someone is able to help.

Thank you in advance.

Solution

Ok, here's my attempt.

So, if I understand correctly, additionally to the information given explicitly in the table, there is implicit unavailability for the units. So I first retrieved this explicitly:

select unit_id, entry_start, entry_end 
  from unit_calendar_entry
union
select p.id, u.entry_start, u.entry_end
 from unit p
 join unit_calendar_entry u
   on  p.parent_unit_id = u.unit_id
union
select p.parent_unit_id as unit_id, u.entry_start, u.entry_end 
 from unit p
 join unit_calendar_entry u
   on p.id = u.unit_id
  and p.parent_unit_id is not null
order by unit_id, entry_start;

the first select just gets the entries already in the table
the second one adds entries for the car, since it can be considered booked if either of its tires is booked
the third one adds entries for the tires, since they can be considered booked if the car is booked

Result:

unit_id     entry_start           entry_end
----------------------------------------------------
1          2021-07-31 00:00:00   2021-08-03 00:00:00
1          2021-08-03 00:00:00   2021-08-07 00:00:00
1          2021-08-04 00:00:00   2021-08-06 00:00:00
1          2021-08-09 00:00:00   2021-08-10 00:00:00
2          2021-07-31 00:00:00   2021-08-03 00:00:00
2          2021-08-03 00:00:00   2021-08-07 00:00:00
3          2021-07-31 00:00:00   2021-08-03 00:00:00
3          2021-08-04 00:00:00   2021-08-06 00:00:00
3          2021-08-09 00:00:00   2021-08-10 00:00:00

Based on that, there is a gaps-and-islands problem to solve in order to group adjacent/overlapping timespans. You can use two queries to mark the entries that belong together, like in this SO answer. If we call the above query subtab, the according statement is

select c.*, sum(case when prev_end < entry_start then 1 else 0 end) over (order by unit_id, entry_start) as grouping   
         from (
              select subtab.*, max(entry_end) over (partition by unit_id order by entry_start rows between unbounded preceding and 1 preceding) as prev_end 
                from subtab
              ) c

The inner query gets the previous end for each row.
The outer one assigns a grouping id that (within each unit_id) identifies all entries belonging to a continuous block (aka an island).

Result:

unit_id entry_start          entry_end            prev_end             grouping
-------------------------------------------------------------------------------
1       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               0
1       2021-08-03 00:00:00  2021-08-07 00:00:00  2021-08-03 00:00:00  0
1       2021-08-04 00:00:00  2021-08-06 00:00:00  2021-08-07 00:00:00  0
1       2021-08-09 00:00:00  2021-08-10 00:00:00  2021-08-07 00:00:00  1
2       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               1
2       2021-08-03 00:00:00  2021-08-07 00:00:00  2021-08-03 00:00:00  1
3       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               1
3       2021-08-04 00:00:00  2021-08-06 00:00:00  2021-08-03 00:00:00  2
3       2021-08-09 00:00:00  2021-08-10 00:00:00  2021-08-06 00:00:00  3

From this (let's call it tab), you can either obtain the unavailable timespans by grouping on unit_id and grouping (see this SO answer or db<>fiddle below) or calculate the free times as follows:

select distinct unit_id
              , NULLIF((min(ifnull(prev_end,'1000-01-01')) over (partition by unit_id, grouping)),'1000-01-01') as available_from
              , min(entry_start) over (partition by unit_id, grouping) as available_til
   from tab
union 
select distinct unit_id
                , max(entry_end) over (partition by unit_id) as available_from
                , null as available_til
 from tab
order by unit_id, available_from

the first query gets available_from as the minimum of the prev_end for each unit_id/grouping. In order to get NULL values from MIN(), I used a workaround similar to this SO answer.
the second query adds a row for each unit_id with the maximum entry_end as start and NULL as end

Result:

unit_id  available_from         available_til
---------------------------------------------------
1        (null)                 2021-07-31 00:00:00
1        2021-08-07 00:00:00    2021-08-09 00:00:00
1        2021-08-10 00:00:00    (null)
2        (null)                 2021-07-31 00:00:00
2        2021-08-07 00:00:00    (null)
3        (null)                 2021-07-31 00:00:00
3        2021-08-03 00:00:00    2021-08-04 00:00:00
3        2021-08-06 00:00:00    2021-08-09 00:00:00
3        2021-08-10 00:00:00    (null)

Putting it all together in one query:

with tab as (
            select c.*, sum(case when prev_end < entry_start then 1 else 0 end) over (order by unit_id, entry_start) as grouping   
              from (
                   select d.*, max(entry_end) over (partition by unit_id order by entry_start rows between unbounded preceding and 1 preceding) as prev_end 
                     from (
                      select unit_id, entry_start, entry_end 
                        from unit_calendar_entry
                      union
                      select p.id, u.entry_start, u.entry_end
                       from unit p
                       join unit_calendar_entry u
                         on p.parent_unit_id = u.unit_id
                      union
                      select p.parent_unit_id as unit_id, u.entry_start, u.entry_end 
                       from unit p
                       join unit_calendar_entry u
                         on p.id = u.unit_id
                        and p.parent_unit_id is not null
                          ) d
                   ) c
            ) 
 select distinct unit_id, NULLIF((min(ifnull(prev_end,'1000-01-01')) over (partition by unit_id, grouping)),'1000-01-01') as available_from, min(entry_start) over (partition by unit_id, grouping) as available_til
   from tab
   union 
  select distinct unit_id, max(entry_end) over (partition by unit_id) as available_from, null as available_til
   from tab
 order by unit_id, available_from