Search code examples
mysqlsqlwindow-functions

Mysql - Simulate lag with multiple previous rows (not just one)


After several hours struggling with this, simulating lag (I'm on MySQL 5.6 without option to upgrade), and discover a pair of functional problems, I come here for a little help, I'm now totally lost here.

After a long proccess, I get a temporary table in a Stored Process with the following data order by userid and purchaseDate (sample):

The sample is get by:

SELECT 
purchaseId
,userId
,purchaseDate 
FROM tempPurchases
WHERE userId > 69
ORDER BY userId, purchaseDate;


+------------+--------+--------------+
| purchaseId | userId | purchaseDate |
+------------+--------+--------------+
|       2074 |     70 | 2018-11-12   |
|       2072 |     70 | 2018-11-30   |
|       2108 |     71 | 2018-01-23   |
|       2114 |     71 | 2018-02-27   |
|       2130 |     71 | 2018-03-07   |
|       2115 |     71 | 2018-04-17   |
|       2105 |     71 | 2018-11-12   |
|       2112 |     71 | 2018-11-30   |
|       2145 |     72 | 2018-01-21   |
|       2141 |     72 | 2018-01-23   |
|       2136 |     72 | 2018-02-01   |
|       2148 |     72 | 2018-03-02   |
|       2158 |     72 | 2018-06-06   |
|       2137 |     72 | 2018-07-04   |
|       2139 |     72 | 2018-07-11   |
|       2153 |     72 | 2018-10-18   |
|       2135 |     72 | 2018-10-25   |
|       2157 |     72 | 2018-11-12   |
|       2147 |     72 | 2018-12-30   |
|       2184 |     73 | 2018-02-21   |
|       2166 |     73 | 2018-03-14   |
|       2176 |     73 | 2018-05-12   |
|       2186 |     73 | 2018-07-09   |
|       2173 |     73 | 2018-08-25   |
|       2162 |     73 | 2018-11-12   |
|       2165 |     73 | 2018-11-30   |
+------------+--------+--------------+

The specification is make a "bonus" by a purchase in 30 days (I used TIMESTAMPDIFF(DAYS) and lag to calculate the difference in days between one and the previous date. Problem here is that the bonus is also compared to multiple previous rows, not just the row above. (Logic for A column)

For example, in the next sample, for userid 72, purchaseId 2136 matches with 2141 but also 2145, and of course this is variable, like a person that make 10 purchase today and 20 yestarday.

So the desired result is:

+------------+--------+--------------+-----------+------------+--------------------------+
| purchaseId | userId | purchaseDate | days diff | A (streak) | B (streak without reset) |
+------------+--------+--------------+-----------+------------+--------------------------+
|       2074 |     70 | 2018-11-12   |         0 |          0 |                        0 |
|       2072 |     70 | 2018-11-30   |        18 |          1 |                        1 |
|       2108 |     71 | 2018-01-23   |         0 |          0 |                        0 |
|       2114 |     71 | 2018-02-27   |        35 |          0 |                        0 |
|       2130 |     71 | 2018-03-07   |         8 |          1 |                        1 |
|       2115 |     71 | 2018-04-17   |        41 |          0 |                        0 |
|       2105 |     71 | 2018-11-12   |       209 |          0 |                        0 |
|       2112 |     71 | 2018-11-30   |        18 |          1 |                        1 |
|       2145 |     72 | 2018-01-21   |         0 |          0 |                        0 |
|       2141 |     72 | 2018-01-23   |         2 |          1 |                        1 |
|       2136 |     72 | 2018-02-01   |         9 |          2 |                        2 |
|       2148 |     72 | 2018-03-02   |        29 |          1 |                        3 |
|       2158 |     72 | 2018-06-06   |        96 |          0 |                        0 |
|       2137 |     72 | 2018-07-04   |        28 |          1 |                        1 |
|       2139 |     72 | 2018-07-11   |         7 |          1 |                        2 |
|       2153 |     72 | 2018-10-18   |        99 |          0 |                        0 |
|       2135 |     72 | 2018-10-25   |         7 |          1 |                        1 |
|       2157 |     72 | 2018-11-12   |        18 |          2 |                        2 |
|       2147 |     72 | 2018-12-30   |        48 |          0 |                        0 |
|       2184 |     73 | 2018-02-21   |         0 |          0 |                        0 |
|       2166 |     73 | 2018-03-14   |        21 |          1 |                        1 |
|       2176 |     73 | 2018-05-12   |        59 |          0 |                        0 |
|       2186 |     73 | 2018-07-09   |        58 |          0 |                        0 |
|       2173 |     73 | 2018-08-25   |        47 |          0 |                        0 |
|       2162 |     73 | 2018-11-12   |        79 |          0 |                        0 |
|       2165 |     73 | 2018-11-30   |        18 |          1 |                        1 |
+------------+--------+--------------+-----------+------------+--------------------------+

Totally lost with a multiple join/left join and lag simulations, any tip, comment or answer will be preciated. Please don't doubt in comment or ask anything.

QUESTION: How can I get this calculation for A streak taking care the "lag" of several rows above? My actual approach is with cursors, but not working neither at moment.

UPDATE:

The expected result is getting a "bonus streak" taking care the above rows for the same userid. For example:

userId 72 make a bought (purchase 1) on 01/jan (first row value is 0). Then again (purchase 2) at 10/jan, so is less than 30 days: 1 point (10 days). Again at 15/01 (purchase 3), so 1 point for diff between purchase 2 (5 days) and another 1 point for the purchase 1 (15 days). Totally bonus: 3 (0 + 1 + (1+1))


Solution

  • I believe it is as simple as this, the subquery inside select counts the number of purchases within past 30 days:

    SELECT t.*, (
        SELECT COUNT(*)
        FROM t AS x
        WHERE userId = t.userId 
        AND purchaseDate <  t.purchaseDate
        AND purchaseDate >= t.purchaseDate - INTERVAL 30 DAY
    ) AS purchases_within_30days
    FROM t