Search code examples
sqlmysqljoingreatest-n-per-group

Trying to return data into columns from 1st, 2nd, 3rd, 4th (etc.) previous dates


Table 1

+--------+------------+-------+
|  URI   |    Date    | Name  |
+--------+------------+-------+
| Fred4  | 2023-04-05 | Fred  |
| Fred3  | 2023-04-01 | Fred  |
| Fred2  | 2023-03-15 | Fred  |
| Fred1  | 2023-03-06 | Fred  |
| Dave3  | 2023-05-22 | Dave  |
| Dave2  | 2023-05-11 | Dave  |
| Dave1  | 2023-05-03 | Dave  |
| Simon6 | 2023-05-20 | Simon |
| Simon5 | 2023-05-11 | Simon |
| Simon4 | 2023-04-21 | Simon |
| Simon3 | 2023-04-19 | Simon |
| Simon2 | 2023-04-12 | Simon |
| Simon1 | 2023-03-25 | Simon |
+--------+------------+-------+

Table 2

+--------+------------+--------+
|  URI   |    Date    |  Item  |
+--------+------------+--------+
| Fred4  | 2023-04-05 | Top    |
| Fred3  | 2023-04-01 | Shorts |
| Fred2  | 2023-03-15 | Band   |
| Fred1  | 2023-03-06 | Top    |
| Dave3  | 2023-05-22 | Shorts |
| Dave2  | 2023-05-11 | Shoes  |
| Dave1  | 2023-05-03 | Top    |
| Simon6 | 2023-05-20 | Shorts |
| Simon5 | 2023-05-11 | Band   |
| Simon4 | 2023-04-21 | Shorts |
| Simon3 | 2023-04-19 | Top    |
| Simon2 | 2023-04-12 | Shoes  |
| Simon1 | 2023-03-25 | Shoes  |
+--------+------------+--------+

For each URI I am trying to pull in each previous item so that Item1 = last bought, item2 = 2nd last bought etc:

+--------+------------+--------+--------+-------+-------+-------+
|  URI   |    Date    | Item1  | Item2  | Item3 | Item4 | Item5 |
+--------+------------+--------+--------+-------+-------+-------+
| Fred4  | 2023-04-05 | Shorts | Band   | Top   |       |       |
| Fred3  | 2023-04-01 | Band   | Top    |       |       |       |
| Fred2  | 2023-03-15 | Top    |        |       |       |       |
| Fred1  | 2023-03-06 | NULL   |        |       |       |       |
| Dave3  | 2023-05-22 | Shoes  | Top    |       |       |       |
| Dave2  | 2023-05-11 | Top    |        |       |       |       |
| Dave1  | 2023-05-03 | NULL   |        |       |       |       |
| Simon6 | 2023-05-20 | Band   | Shorts | Top   | Shoes | Shoes |
| Simon5 | 2023-05-11 | Shorts | Top    | Shoes | Shoes |       |
| Simon4 | 2023-04-21 | Top    | Shoes  | Shoes |       |       |
| Simon3 | 2023-04-19 | Shoes  | Shoes  |       |       |       |
| Simon2 | 2023-04-12 | Shoes  |        |       |       |       |
| Simon1 | 2023-03-25 | NULL   |        |       |       |       |
+--------+------------+--------+--------+-------+-------+-------+

Currently I can get the most recent last item but I am struggling with the 2nd, 3rd etc. despite trying a few different things.

with all_data as (
    SELECT t1.URI, t1.date, t1.name
    ,t2.Item as Item1, t2.Item as Item2, t2.Item as Item3, t2.Item as Item4, t2.Item as Item5
    FROM Table1 t1
    LEFT OUTER JOIN Table2 t2 on t1.name = t2.name and t1.date > t2.date
)
SELECT URI, date, name, Item1, Item2, Item3, Item4, Item5
from all_data
group by URI, date, name
order by URI, date, name
; 

Any help would be great :) I'm pretty sure the query needs to be quite a lot bigger unless someone knows a few tricks.

Using MySQL and HeidiSQL client.


Solution

  • with t as (
      select  t1.uri, t1.date, t1.name, t2.Item,
               row_number() over (partition by t1.name order by t2.Date desc ) rn
      from Table1 t1 join Table2 t2 on t1.URI = t2.URI and t1.date >= t2.date),
    joined as (
      select a.uri, a.date, a.name, b.item, b.rn - a.rn rn 
      from t a left join t b on a.name = b.name and a.rn < b.rn )
    select uri, date, 
           max(case rn when 1 then item end) item1,
           max(case rn when 2 then item end) item2,
           max(case rn when 3 then item end) item3,
           max(case rn when 4 then item end) item4,
           max(case rn when 5 then item end) item5
    from joined group by uri, date, name order by name, date desc
    

    dbfiddle demo

    Steps:

    • number the rows in the subquery,
    • perform a self-join of this subquery based on name and a.rn < b.rn, which multiplies entries including desired empty rows (Simon1, Fred1),
    • subtract row numbers a.rn - b.rn, this gives correct value for pivot to operate,
    • make classic pivot,
    • sort rows by undisplayed column name and date desc, if you want it this way.

    You win an award in the category of strange pivots ;-)