I want to compare values, in this case names between dates I have in a dataframe. Assume following testframe:
import pandas as pd
# Create a sample DataFrame with 'Date' and 'Name' columns
data = {
'Date': ['2023-10-03', '2023-10-03', '2023-10-02', '2023-10-02', '2023-10-01', '2023-10-01'],
'Name': ['A', 'B', 'A', 'B', 'A', 'C']
}
df = pd.DataFrame(data)
I want to create a new column e.g. "Check" that displays a 0 for all dates, where the name composition compared to previous date did not change, and a 1 where it changed. The desired outcome would be:
Date Name Check
0 2023-10-03 A 0 # A/B, as previously
1 2023-10-03 B 0
2 2023-10-02 A 1 # A/B, changed
3 2023-10-02 B 1
4 2023-10-01 A 1 # A/C
5 2023-10-01 C 1
For '2023-10-03' the value is 0 because the names are exactly like for '2023-10-02', namely 'A', and 'B'. For the other 2 dates the name composition either changed or it is the first date observation.
Thanks!
If I understand correctly, you can aggregate as set (with groupby.agg
) and check if the composition changed with shift
:
# ensure datetime
df['Date'] = pd.to_datetime(df['Date'])
# get composition per Date
s = df.groupby('Date')['Name'].agg(set)
# check if composition is same as previous day
df['Check'] = df['Date'].map(s.ne(s.shift(1, freq='D')).astype(int))
Output:
Date Name Check
0 2023-10-03 A 0
1 2023-10-03 B 0
2 2023-10-02 A 1
3 2023-10-02 B 1
4 2023-10-01 A 1
5 2023-10-01 C 1
Intermediate s
:
Date
2023-10-01 {C, A}
2023-10-02 {B, A}
2023-10-03 {B, A}
Name: Name, dtype: object
If you want to consider the next date in terms of order of the groups and not by actual shift of 1 day:
s = df.groupby('Date', sort=False)['Name'].agg(set)
df['Check'] = df['Date'].map(s.ne(s.shift(-1)).astype(int))
Or for the previous available date:
df['Check'] = df['Date'].map(df.groupby('Date', sort=True)['Name']
.agg(set).diff().ne(set()).astype(int))