python-3.x pandas dataframe pandas-groupby data-preprocessing

Changing Shape and Structure of Data

I have the following dataset

Office	Employee ID	Joining Date	Attrition Date
AA	700237	27-11-2017
AA	700238	11-01-2018
AA	700252	14-02-2018	08-04-2018
AB	700287	18-01-2014
AB	700449	28-02-2014	17-04-2014

The idea is to add in Active Column if somebody joins and deduct if somebody resigns in any month and therefore want to change it in the following format using python

Office	Month & Year	Active
AA	Jan-17	0
AA	Feb-17	0
AA	Mar-17	0
AA	Apr-17	0
AA	May-17	0
AA	Jun-17	0
AA	Jul-17	0
AA	Aug-17	0
AA	Sep-17	0
AA	Oct-17	0
AA	Nov-17	1
AA	Dec-17	1
AA	Jan-18	2
AA	Feb-18	3
AA	Mar-18	3
AA	Apr-18	2
AB	Jan-14	1
AB	Feb-14	2
AB	Mar-14	2
AB	Apr-14	1

Please help.

Solution

Use:

#convert columns to datetimes
df['Joining Date'] = pd.to_datetime(df['Joining Date'], dayfirst=True)
df['Attrition Date'] = pd.to_datetime(df['Attrition Date'], dayfirst=True)

#add new rows by first January of minimal year per groups
df1 = df.groupby('Office')['Joining Date'].min() - pd.offsets.DateOffset(month=1, day=1)
df = df.append(df1.reset_index()).sort_values(['Office','Joining Date'])


#replace missing values in Attrition Date by maximal date with next month
#replace missing values in Joining  Date by maximal date with next month
next_month = (df.groupby('Office')['Attrition Date'].transform('max') + 
               pd.offsets.DateOffset(months=1))
next_month1 = (df.groupby('Office')['Joining Date'].transform('max') + 
               pd.offsets.DateOffset(months=1))

df['Attrition Date'] = df['Attrition Date'].fillna(next_month).fillna(next_month1)


#explode start and end datetimes converted to months with years
f = lambda x: pd.date_range(x['Joining Date'], 
                            x['Attrition Date'], freq='M').strftime('%b-%y')
df['Month & Year'] = df.apply(f, axis=1)

#count number of Employee ID with omit missing values
df = (df.explode('Month & Year')
        .groupby(['Office','Month & Year'], sort=False)['Employee ID']
        .count()
        .reset_index(name='Active'))

print (df)
   Office Month & Year  Active
0      AA       Jan-17       0
1      AA       Feb-17       0
2      AA       Mar-17       0
3      AA       Apr-17       0
4      AA       May-17       0
5      AA       Jun-17       0
6      AA       Jul-17       0
7      AA       Aug-17       0
8      AA       Sep-17       0
9      AA       Oct-17       0
10     AA       Nov-17       1
11     AA       Dec-17       1
12     AA       Jan-18       2
13     AA       Feb-18       3
14     AA       Mar-18       3
15     AA       Apr-18       2
16     AB       Jan-14       1
17     AB       Feb-14       2
18     AB       Mar-14       2
19     AB       Apr-14       1