python pandas dataframe time-series data-analysis

Calculate 3 months unique Emp count for a given month from last 3 months data using pandas

I am looking to calculate last 3 months of unique employee ID count using pandas. I am able to calculate unique employee ID count for current month but not sure how to do it for last 3 months.

df['DateM'] = df['Date'].dt.to_period('M')
df.groupby("DateM")["EmpId"].nunique().reset_index().rename(columns={"EmpId":"One Month Unique EMP count"}).sort_values("DateM",ascending=False).reset_index(drop=True)

testdata.xlsx Google drive link.. https://docs.google.com/spreadsheets/d/1Kaguf72YKIsY7rjYfctHop_OLIgOvIaS/edit?usp=sharing&ouid=117123134308310688832&rtpof=true&sd=true

After using above groupby command I get output for 1 month groups based on DateM column which correct.

Similarly I'm looking for another column where 3 months unique active user count based on EmpId is calculated.

Sample output:

I tried calculating same using rolling window but it doesn't help. Even I tried creating period for last 3 months and also search it before asking this question. Thanks for your help in advance, otherwise I'll have to calculate it manually.

Solution

I don't know if you are looking for 3 consecutive months or something else because your date discontinues at 2022-09 to 2022-10.

I also don't know your purpose, so I give a general solution here. In case you only want to count unique for every 3 consecutive months, then it is much easier. The solution here gives you the list of unique empid for every 3 consecutive months. Note that: this means for 2022-08, I will count 3 consecutive months as 2022-08, 2022-09, and 2022-10. And so on

# Sort data:
df.sort_values(by='datem', inplace=True, ignore_index=True)

# Create `dfu` which is `df` with unique `empid` for each `datem` only:
dfu = df.groupby(['datem', 'empid']).count().reset_index()
dfu.rename(columns={'date':'count'}, inplace=True)
dfu.sort_values(by=['datem', 'empid'], inplace=True, ignore_index=True)
dfu

# Obtain the list of unique periods:
unique_period = dfu['datem'].unique()

# Create empty dataframe:
dfe = pd.DataFrame(columns=['datem', 'empid', 'start_period'])

for p in unique_period:
    # Create 3 consecutive range:
    tem_range = pd.period_range(start=p, freq='M', periods=3)
    
    # Extract dataframe from `dfu` with period in range wanted:
    tem_dfu = dfu.loc[dfu['datem'].isin(tem_range),:].copy()
    
    # Some cleaning:
    tem_dfu.drop_duplicates(subset='empid', keep='first')
    tem_dfu.drop(columns='count', inplace=True)
    tem_dfu['start_period'] = p
    
    # Concat and obtain desired output:
    dfe = pd.concat([dfe, tem_dfu])

dfe

Hope this is what you are looking for