Search code examples
pythonpython-polars

Polars shift by date


Say I have this:

numpy.random.seed(1)

df = (polars
  .DataFrame(dict(
    dt=numpy.random.randint(datetime.datetime(2023, 1, 1).timestamp(), datetime.datetime(2023, 1, 31).timestamp(), 100)
    ))
  .select(
    polars.from_epoch('dt').sort()
    )
  .filter(
    (polars.col('dt').dt.day().cos() * 17).floor() % 3 == 0
    )
  .pipe(lambda df: df.with_columns(
    j=polars.lit(numpy.random.randint(10, 99, df.height)),
    ))
  .with_columns(
    k=polars.col('j').last().over(polars.col('dt').dt.date()),
    )
  )

which produces:

 dt (datetime[μs])    j (i64)  k (i64)
 2023-01-01 12:48:34  30       30
 2023-01-04 09:37:05  42       75
 2023-01-04 15:13:42  22       75
 2023-01-04 22:58:20  75       75
 2023-01-07 00:18:27  70       20
 2023-01-07 02:42:28  34       20
 2023-01-07 06:32:09  92       20
 2023-01-07 09:38:43  12       20
 2023-01-07 20:59:16  20       20
 2023-01-08 05:25:04  64       76
 2023-01-08 09:10:17  92       76
 2023-01-08 10:53:40  96       76
 2023-01-08 14:29:28  80       76
 2023-01-08 16:11:37  76       76
 2023-01-10 12:59:38  81       58
 2023-01-10 21:21:29  58       58
 2023-01-11 14:33:55  64       52
 2023-01-11 18:54:01  25       52
 2023-01-11 22:00:55  15       52
 2023-01-11 22:34:28  27       52
 2023-01-11 23:41:27  52       52
 2023-01-13 04:07:50  30       23
 2023-01-13 08:20:19  58       23
 2023-01-13 09:44:07  32       23
 2023-01-13 14:18:54  23       23
 2023-01-20 08:42:19  63       94
 2023-01-20 17:07:37  94       94
shape: (27, 3)  

where I've calculated k to be the last value of j for each of the dates. I'd like to add another column l which would have the value of k for the previous available date, i.e.:

 dt (date)   j (i64)  k (i64)  l (i64)
 2023-01-01  30       30       null
 2023-01-04  42       75       30
 2023-01-04  22       75       30
 2023-01-04  75       75       30
 2023-01-07  70       20       75
 2023-01-07  34       20       75
 2023-01-07  92       20       75
 2023-01-07  12       20       75
 2023-01-07  20       20       75
 2023-01-08  64       76       20
 2023-01-08  92       76       20
 2023-01-08  96       76       20
 2023-01-08  80       76       20
 2023-01-08  76       76       20
 2023-01-10  81       58       76
 2023-01-10  58       58       76
 2023-01-11  64       52       58
 2023-01-11  25       52       58
 2023-01-11  15       52       58
 2023-01-11  27       52       58
 2023-01-11  52       52       58
 2023-01-13  30       23       52
 2023-01-13  58       23       52
 2023-01-13  32       23       52
 2023-01-13  23       23       52
 2023-01-20  63       94       23
 2023-01-20  94       94       23
shape: (27, 4)

That is, instead of "shift by a constant number of rows" which the standard shift function provides, I'm looking for a "shift by date" functionality. How would I go about doing that?


Solution

  • Can you .last().over(day).shift() + .first().over(day)?

    df.with_columns(
       l = pl.col('k').last().over(pl.col('dt').dt.truncate('1d')).shift()
    ).with_columns(
       pl.first('l').over(pl.col('dt').dt.truncate('1d'))
    )
    
    shape: (27, 4)
    ┌─────────────────────┬─────┬─────┬──────┐
    │ dt                  ┆ j   ┆ k   ┆ l    │
    │ ---                 ┆ --- ┆ --- ┆ ---  │
    │ datetime[μs]        ┆ i64 ┆ i64 ┆ i64  │
    ╞═════════════════════╪═════╪═════╪══════╡
    │ 2023-01-01 12:48:34 ┆ 30  ┆ 30  ┆ null │
    │ 2023-01-04 09:37:05 ┆ 42  ┆ 75  ┆ 30   │
    │ 2023-01-04 15:13:42 ┆ 22  ┆ 75  ┆ 30   │
    │ 2023-01-04 22:58:20 ┆ 75  ┆ 75  ┆ 30   │
    │ 2023-01-07 00:18:27 ┆ 70  ┆ 20  ┆ 75   │
    │ 2023-01-07 02:42:28 ┆ 34  ┆ 20  ┆ 75   │
    │ 2023-01-07 06:32:09 ┆ 92  ┆ 20  ┆ 75   │
    │ 2023-01-07 09:38:43 ┆ 12  ┆ 20  ┆ 75   │
    │ 2023-01-07 20:59:16 ┆ 20  ┆ 20  ┆ 75   │
    │ 2023-01-08 05:25:04 ┆ 64  ┆ 76  ┆ 20   │
    │ 2023-01-08 09:10:17 ┆ 92  ┆ 76  ┆ 20   │
    │ 2023-01-08 10:53:40 ┆ 96  ┆ 76  ┆ 20   │
    │ 2023-01-08 14:29:28 ┆ 80  ┆ 76  ┆ 20   │
    │ 2023-01-08 16:11:37 ┆ 76  ┆ 76  ┆ 20   │
    │ 2023-01-10 12:59:38 ┆ 81  ┆ 58  ┆ 76   │
    │ 2023-01-10 21:21:29 ┆ 58  ┆ 58  ┆ 76   │
    │ 2023-01-11 14:33:55 ┆ 64  ┆ 52  ┆ 58   │
    │ 2023-01-11 18:54:01 ┆ 25  ┆ 52  ┆ 58   │
    │ 2023-01-11 22:00:55 ┆ 15  ┆ 52  ┆ 58   │
    │ 2023-01-11 22:34:28 ┆ 27  ┆ 52  ┆ 58   │
    │ 2023-01-11 23:41:27 ┆ 52  ┆ 52  ┆ 58   │
    │ 2023-01-13 04:07:50 ┆ 30  ┆ 23  ┆ 52   │
    │ 2023-01-13 08:20:19 ┆ 58  ┆ 23  ┆ 52   │
    │ 2023-01-13 09:44:07 ┆ 32  ┆ 23  ┆ 52   │
    │ 2023-01-13 14:18:54 ┆ 23  ┆ 23  ┆ 52   │
    │ 2023-01-20 08:42:19 ┆ 63  ┆ 94  ┆ 23   │
    │ 2023-01-20 17:07:37 ┆ 94  ┆ 94  ┆ 23   │
    └─────────────────────┴─────┴─────┴──────┘