Search code examples
pandaspython-polars

Polars syntax for Pandas complex queries


I am trying to benchmark Polars but I am stuck on how to replicate the following Pandas expression in Polars.

df['ll_lat'] = (df['lat'] // 0.1 * 0.1).round(1)
df['ll_lon'] = (df['lon'] // 0.1 * 0.1).round(1)
df['temporalBasket'] = df['eventtime'].astype(str).str[:13]
df = df.groupby(['ll_lat', 'll_lon', 'temporalBasket']).agg(strikes=('lat', 'count'))
df

Can someone help me translate and explain how I should be thinking about Polars column creation etc. please?

Here is a df.head() output to make things a little clearer.

enter image description here


Solution

  • You can do something similar in Polars to what you are doing in Pandas. However, you can use truncate the extract the day + hour instead of slicing the string. This should be faster, and also easier to read.

    For rounding down to the nearest decimal, I did not find a Polars method for it. So I kept your logic.

    # Sample data
    data = {
        'lat': [45.123, 45.155, 45.171, 45.191, 45.123],
        'lon': [12.321, 12.322, 12.345, 12.366, 12.321],
        'eventtime': [
            datetime(2023, 4, 1, 10, 20),
            datetime(2023, 4, 1, 12, 30),
            datetime(2023, 4, 1, 10, 45),
            datetime(2023, 4, 2, 9, 15),
            datetime(2023, 4, 2, 11, 50),
        ],
    }
    
    df_pl = pl.DataFrame(data)
    
    df_pl.group_by(
        (pl.col('lat') // 0.1 * 0.1).alias('ll_lat'),
        (pl.col('lon') // 0.1 * 0.1).alias('ll_lon'),
        pl.col('eventtime').dt.truncate('1h').alias('temporalBasket')
    ).agg(
        strikes=pl.col('lat').count()
    )
    
    # Output
    ┌────────┬────────┬─────────────────────┬─────────┐
    │ ll_lat ┆ ll_lon ┆ temporalBasket      ┆ strikes │
    │ ---    ┆ ---    ┆ ---                 ┆ ---     │
    │ f64    ┆ f64    ┆ datetime[μs]        ┆ u32     │
    ╞════════╪════════╪═════════════════════╪═════════╡
    │ 45.1   ┆ 12.3   ┆ 2023-04-01 12:00:00 ┆ 1       │
    │ 45.1   ┆ 12.3   ┆ 2023-04-02 09:00:00 ┆ 1       │
    │ 45.1   ┆ 12.3   ┆ 2023-04-01 10:00:00 ┆ 2       │
    │ 45.1   ┆ 12.3   ┆ 2023-04-02 11:00:00 ┆ 1       │
    └────────┴────────┴─────────────────────┴─────────┘