Search code examples
sqlpostgresqlgroup-byaggregate-functionsdate-range

Sum() up CASE values from joined tables in PostgreSQL


I'm creating a hotel booking tool for our desk/reception. My current SELECT doesn't compute the "final" sum()/CASE, meaning that as long as there are more then one overlap (hb.booking_period && cp.valid_period) per booking_ID, the CASE values in the data output only represent shares of the total price. If you sum up those shares, it matches the total price (of the stay). How can I calculate the sums of all these CASE results/shares (that fit the related booking_id)?

SELECT
    hb.booking_id,
    hb.guest_name,
    hb.room_category_id,
    hb.booking_period,
    cp.base_price,
    CASE
        WHEN upper(hb.booking_period * cp.valid_period) = upper(hb.booking_period) THEN
            (upper(hb.booking_period * cp.valid_period) - lower(hb.booking_period * cp.valid_period)) * cp.base_price
        ELSE
            (upper(hb.booking_period * cp.valid_period) - lower(hb.booking_period * cp.valid_period) + 1) * cp.base_price
    END

FROM
    hotel_bookings hb
JOIN
    category_prices cp ON hb.room_category_id = cp.room_category_id AND hb.booking_period && cp.valid_period
GROUP BY
    hb.booking_id, hb.guest_name, hb.room_category_id, hb.booking_period, cp.base_price, cp.valid_period;

booking_period and valid_period are daterange datatypes.

The CASE ensures that the intersection * base_price is calculated correctly, since otherwise for each intersection 1 stay would be lost.

Here are the table definitions:

CREATE TABLE room_categories 
(
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(25)
);


CREATE TABLE category_prices 
(
    category_price_id SERIAL PRIMARY KEY,
    room_category_id INTEGER REFERENCES room_categories(category_id),
    valid_period daterange,
    base_price DECIMAL(10, 2)
);

CREATE TABLE hotel_bookings 
(
    booking_id SERIAL PRIMARY KEY,
    guest_name VARCHAR(255),
    room_category_id INTEGER REFERENCES room_categories(category_id),
    booking_period daterange
);

The data:

INSERT INTO room_categories (category_name) VALUES
    ('single room'),
    ('double room');

INSERT INTO category_prices (room_category_id, valid_period, base_price) VALUES
    (1, '[2023-01-01, 2023-01-31)', 80.00),
    (1, '[2023-02-01, 2023-02-28)', 85.00),
    (1, '[2023-03-01, 2023-03-31)', 88.00),
    (2, '[2023-01-01, 2023-01-31)', 100.00),
    (2, '[2023-02-01, 2023-02-28)', 105.00),
    (2, '[2023-03-01, 2023-03-31)', 108.00);

INSERT INTO hotel_bookings (guest_name, room_category_id, booking_period) VALUES
    ('John Doe', 1, '[2023-01-15, 2023-01-20)'), --correct calced, 1 intersection
    ('Jane Smith', 1, '[2023-01-30, 2023-02-02)'), -- 2 shares (need/want a sumup)
    ('Jane Smith', 1, '[2023-02-25, 2023-03-03)'),
    ('Jordan Miller', 2, '[2023-01-30, 2023-03-02)'); -- 3 shares (need/want a sumup)

Solution

  • sum(case when a then b else c end) is perfectly allowed. Looking at your edits with examples, my guess is that you're grouping too deep: the periods and their prices multiply your booking rows, spawning a separate entry for each hb.booking_period && cp.valid_period intersection. To collapse them into one row per booking with one valid sum, you can aggregate those, e.g. with jsonb_object_agg(). Demo at db<>fiddle:

    SELECT
        hb.booking_id,
        hb.guest_name,
        hb.room_category_id,
        hb.booking_period,
        jsonb_pretty(jsonb_object_agg(cp.valid_period,cp.base_price)),
        SUM(CASE WHEN  upper(hb.booking_period * cp.valid_period) = upper(hb.booking_period) 
                 THEN (upper(hb.booking_period * cp.valid_period) - lower(hb.booking_period * cp.valid_period)) * cp.base_price
                 ELSE (upper(hb.booking_period * cp.valid_period) - lower(hb.booking_period * cp.valid_period) + 1) * cp.base_price
            END )
    FROM hotel_bookings hb
    JOIN category_prices cp 
    ON hb.room_category_id = cp.room_category_id 
    AND hb.booking_period && cp.valid_period
    GROUP BY hb.booking_id, 
      hb.guest_name, 
      hb.room_category_id, 
      hb.booking_period;
    
    booking_id guest_name room_category_id booking_period jsonb_pretty sum
    1 John Doe 1 [2023-01-15,2023-01-20) {"[2023-01-01,2023-01-31)": 80.00} 400.00
    2 Jane Smith 1 [2023-01-30,2023-02-02) {"[2023-01-01,2023-01-31)": 80.00,"[2023-02-01,2023-02-28)": 85.00
    }
    245.00
    3 Jane Smith 1 [2023-02-25,2023-03-03) {"[2023-02-01,2023-02-28)": 85.00,
    "[2023-03-01,2023-03-31)": 88.00}
    516.00
    4 Jordan Miller 2 [2023-01-30,2023-03-02) {"[2023-01-01,2023-01-31)": 100.00,
    "[2023-02-01,2023-02-28)": 105.00,
    "[2023-03-01,2023-03-31)": 108.00}
    3248.00

    The reason for that case statement is probably that you unintentionally used upper bound exclusive ranges for your category prices. If you make them upper bound inclusive, you can get rid of that whole thing and just multiply the booking_period*valid_period intersection length by the base_price: demo2

    UPDATE category_prices 
    SET valid_period=daterange(lower(valid_period),upper(valid_period),'[]');
    
    SELECT
        hb.booking_id,
        hb.guest_name,
        hb.room_category_id,
        hb.booking_period,
        jsonb_pretty(jsonb_object_agg(cp.valid_period,cp.base_price)),
        SUM((  upper(hb.booking_period * cp.valid_period) 
             - lower(hb.booking_period * cp.valid_period)) 
            * cp.base_price )
    FROM hotel_bookings hb
    JOIN category_prices cp 
    ON hb.room_category_id = cp.room_category_id 
    AND hb.booking_period && cp.valid_period
    GROUP BY hb.booking_id, 
      hb.guest_name, 
      hb.room_category_id, 
      hb.booking_period;
    

    See the cheatsheet:

    select raw,
           dr AS canonicalized,
           lower(dr),
           lower_inc(dr),
           upper(dr),
           upper_inc(dr)
    from (values ('(2023-01-01, 2023-01-31)'),
                 ('(2023-01-01, 2023-01-31]'),
                 ('[2023-01-01, 2023-01-31)'),
                 ('[2023-01-01, 2023-01-31]'))AS a(raw),
         lateral (select raw::daterange)AS b(dr);
    
    raw canonicalized lower lower_inc upper upper_inc
    (2023-01-01, 2023-01-31) [2023-01-02,2023-01-31) 2023-01-02 t 2023-01-31 f
    (2023-01-01, 2023-01-31] [2023-01-02,2023-02-01) 2023-01-02 t 2023-02-01 f
    [2023-01-01, 2023-01-31) [2023-01-01,2023-01-31) 2023-01-01 t 2023-01-31 f
    [2023-01-01, 2023-01-31] [2023-01-01,2023-02-01) 2023-01-01 t 2023-02-01 f

    fiddle