Search code examples
sqlcountsql-updatedata-warehouse

Count number of previous transactions per customer in SQL (ER to DW)


I have to count the number of previous bookings a customer has made up to the current booking. See below for desired output.

HotelID RoomID BookingID PrevBookingCount  
H6  R8  G3  B1  0  
H6  R2  G5  B2  0  
H6  R7  G1  B3  0  
H10 R4  G7  B4  0  
H10 R9  G2  B5  0  
H13 R1  G11 B6  0  
H13 R8  G1  B7  1  
H13 R5  G5  B8  1  
H13 R3  G1  B9  2  

The third column indicates the GuestID. This is the field that must be counted to determine previous bookings. For example, see Booking B3 (row 3) has GuestID G1. This customer has first booking B1. Previous bookings must be zero. Booking B7 is again made by Guest G1, but now this customer has one previous booking. Similarly for Booking B9.

The table has been created with PrevBookingCount set to NULL, and an update or alter table statement must be used to deliver the results.

create table bookings
(HotelID varchar(4),
RoomID varchar(4),
BookingID varchar(4),
PrevBookingCount int)

insert into bookings
values('H6','R8','G3','B1',NULL)
values('H6','R2','G5','B2',NULL)
...
values('H13','R3','G1','B9',NULL)

My following (poor) attempt yields the same value (6) for each record.

update bookings
set PrevBookingCount = count(distinct GuestID)

Is there some iterative approach or a subquery that can deliver the results?


Solution

  • SQL tables represent unordered sets. There is no such thing as a "previous booking", unless a column provides that information. I speculate that bookingid does this.

    Then, I recommend using row_number():

    select b.*,
           row_number() over (partition by guestid order by bookingid) - 1 as PrevBookingCount
    from bookings b;
    

    I see no reason to keep this as a separate column in the table, because it can be easily calculated.

    However, if you really wanted to do this, then the standard SQL syntax would be:

    update bookings
        set PrevBookingCount = (select count(*)
                                from bookings b2
                                where b2.GuestID = b.GuestId and
                                      b2.BookingId < b.BookingId
                               );
    

    This seems needlessly expensive and will be out-of-date with the next database change.