I have a pandas dataframe of panel data where each row is a time series for an individual, each column is a day in the time series. On a daily basis I would like to count the number of day on day changes so I can determine what percentage of individuals change each day.
indiv = ['Tom', 'Mike', 'Dave']
date_one = ['yes', 'yes', 'no']
date_two = ['no', 'yes', 'no']
date_three = ['yes', 'yes', 'yes']
df = pd.DataFrame(zip(indiv, date_one, date_two, date_three), columns = ['name', '2020-01-29', '2020-01-30', '2020-01-31'])
Set the index to the name then transpose the df.
df = df.set_index('name').T
name 2020-01-29 2020-01-30 2020-01-31
0 Tom yes no yes
1 Mike yes yes yes
2 Dave no no yes
Then replace the 'yes' and 'no' values with True and False
df.replace({'yes': True, 'no': False}, inplace=True)
name 2020-01-29 2020-01-30 2020-01-31
0 Tom True False True
1 Mike True True True
2 Dave False False True
Now sum across the columns...
df['changes'] = df.sum(axis=1)
Resulting in
name Tom Mike Dave changes
2020-01-29 True True False 2
2020-01-30 False True False 1
2020-01-31 True True True 3