Search code examples
sqlhivehiveql

Get active weeks from start and end date


I have subscriptions data as shown below. The data shows when a user bought a subscription. It has user_id,subscription_id,start date and end_date.

I have derived wk_start and wk_end from it.

+------+-----------------+------------+------------+----------+--------+
| user | subscription_id |   start    |    end     | wk_start | wk_end |
+------+-----------------+------------+------------+----------+--------+
|    1 | 1A              | 2019-06-01 | 2019-06-30 |       22 |     27 |
|    2 | 2A              | 2019-06-01 | 2019-06-21 |       22 |     25 |
|    3 | 3A              | 2019-06-01 | 2019-06-21 |       22 |     24 |
|    1 | 1B              | 2019-07-04 | 2019-08-04 |       27 |     29 |
|    2 | 2B              | 2019-07-01 | 2019-07-31 |       27 |     31 |
+------+-----------------+------------+------------+----------+--------+

Now I want to transform the data in a such a way that for each user,subscription_id, I will get the active weeks for a particular user.

The desired output is as below

+------+-----------------+------+
| user | subscription_id | Week |
+------+-----------------+------+
|    1 | 1A              |   22 |
|    1 | 1A              |   23 |
|    1 | 1A              |   24 |
|    1 | 1A              |   25 |
|    1 | 1A              |   26 |
|    1 | 1A              |   27 |
|    2 | 2A              |   22 |
|    2 | 2A              |   23 |
|    2 | 2A              |   24 |
|    2 | 2A              |   25 |
|    3 | 3A              |   22 |
|    3 | 3A              |   23 |
|    3 | 3A              |   24 |
|    1 | 1B              |   27 |
|    1 | 1B              |   28 |
|    1 | 1B              |   29 |
|    2 | 2B              |   27 |
|    2 | 2B              |   28 |
|    2 | 2B              |   29 |
|    2 | 2B              |   30 |
|    2 | 2B              |   31 |
+------+-----------------+------+

It basically expands the subscriptions data to get active weeks for each user.

For example user 1 with subscription_id 1A was active from week 22 till 27.

Similarly user 2 with subscription_id 2A was active from week 22 till 25.

Now user 2 was again active from week 27 till 31 with subscription_id 2B

I hope this helps to understand the desired result

I have made a db-fiddle here.(though it's in mysql) but should help in some or the other way


Solution

  • You can create a table that includes all week numbers and do a partial cartesian join against it, something like this:

    CREATE TABLE weeks (
      `week` INTEGER 
    );
    
    INSERT INTO weeks
      (`week`)
    VALUES
             (1),  (2),  (3),  (4),  (5),  (6),  (7),  (8),  (9)
    , (10), (11), (12), (13), (14), (15), (16), (17), (18), (19)
    , (20), (21), (22), (23), (24), (25), (26), (27), (28), (29)
    , (30), (31), (32), (33), (34), (35), (36), (37), (38), (39)
    , (40), (41), (42), (43), (44), (45), (46), (47), (48), (49)
    , (50), (51), (52), (53)
    ;
    
    SELECT t1.user, t1.subscription_id, w.week
    FROM table1 t1
    CROSS JOIN weeks w
    WHERE w.week BETWEEN t1.wk_start and t1.wk_end
    ORDER BY t1.user, t1.subscription_id, w.week;
    

    https://www.db-fiddle.com/f/mn4towXxscmLsibQjwtNEH/0

    The weeks table seems like a bit of a nuisance, but it just has to be created once and then you can forget about it.