Search code examples
pythonpandaspaddingresampling

Resample each ID of a dataframe with a given date range


I have a dataframe like the one below. Each week, different IDs receive different tests.

date    id  test    received
2023-01-02  a1  a   1
2023-01-02  c3  a   1
2023-01-02  e5  a   1
2023-01-02  b2  b   1
2023-01-02  d4  b   1
2023-01-09  a1  c   1
2023-01-09  b2  c   1
2023-01-09  c3  c   1
d = {
    "date": [
        "2023-01-02",
        "2023-01-02",
        "2023-01-02",
        "2023-01-02",
        "2023-01-02",
        "2023-01-09",
        "2023-01-09",
        "2023-01-09",
    ],
    "id": ["a1", "c3", "e5", "b2", "d4", "a1", "b2", "c3"],
    "test": ["a", "a", "a", "b", "b", "c", "c", "c"],
    "received": [1, 1, 1, 1, 1, 1, 1, 1],
}
df = pd.DataFrame(data=d)

I want to resample it so that every ID is listed beside all the tests administered that week, and received = 1 or 0 depending on if they received it.

week_starting   id  test    received
02/01/2023  a1  a   1
02/01/2023  b2  a   0
02/01/2023  c3  a   1
02/01/2023  d4  a   0
02/01/2023  e5  a   1
02/01/2023  a1  b   0
02/01/2023  b2  b   1
02/01/2023  c3  b   0
02/01/2023  d4  b   1
02/01/2023  e5  b   0
09/01/2023  a1  c   1
09/01/2023  b2  c   1
09/01/2023  c3  c   1
09/01/2023  d4  c   0
09/01/2023  e5  c   0

Resampling by date is covered on StackOverflow, but resampling / padding by ID is not. Help?


Solution

  • Is this what you're looking for?

    import itertools
    all_combs = itertools.product(df['date'].unique(),\
    df['id'].unique(),df['test'].unique())
    dff = pd.DataFrame(all_combs, columns=['date', 'id', \'test']).sort_values('test')
    dff = pd.merge(dff, df, how='outer').fillna(0)
    

    Output:

              date  id test  received
     0   2023-01-02  a1    a       1.0
     1   2023-01-09  c3    a       0.0
     2   2023-01-09  a1    a       0.0
     3   2023-01-02  d4    a       0.0
     4   2023-01-09  b2    a       0.0
     5   2023-01-02  b2    a       0.0
     6   2023-01-09  e5    a       0.0
     7   2023-01-02  c3    a       1.0
     8   2023-01-09  d4    a       0.0
     9   2023-01-02  e5    a       1.0
     10  2023-01-02  e5    b       0.0
     11  2023-01-09  b2    b       0.0
     12  2023-01-09  e5    b       0.0
     13  2023-01-02  c3    b       0.0
     14  2023-01-02  d4    b       1.0
     15  2023-01-09  d4    b       0.0
     16  2023-01-09  a1    b       0.0
     .......