My data set is basically a set people and dates that give data about that persons assignment on a given day. Each person, however, gets several rows as they may have multiple assignments on a given day. I'm trying to pivot/group this data such that the new dataframe has the people with a single row and columns representing a count of each assignment. See below for an example. What's throwing me off is that each person could have more than one row in the initial data set. Thanks in advance.
Name | 1/1/23 | 1/2/23 | 1/3/23 |
---|---|---|---|
Person 1 | Assignment A | Assignment B | Assignment C |
Person 1 | Assignment D | "EMPTY" | "EMPTY" |
Person 1 | "Empty" | "Empty" | "Empty" |
Person 2 | Assignment E | "Empty" | "Empty" |
Person 2 | "Empty" | "Empty" | Assignment E |
Etc....
Would turn into
Name | Assignment A | Assignment B | Assignment C | Assignment D | Assignment E | "Empty" |
---|---|---|---|---|---|---|
Person 1 | 1 | 1 | 1 | 1 | 0 | 5 |
Person 2 | 0 | 0 | 0 | 0 | 2 | 4 |
Example
i think 'EMPTY' is typo of 'Empty', so recreat example code
import pandas as pd
data1 = {'Name': ['Person 1', 'Person 1', 'Person 1', 'Person 2', 'Person 2'],
'1/1/23': ['Assignment A', 'Assignment D', 'Empty', 'Assignment E', 'Empty'],
'1/2/23': ['Assignment B', 'Empty', 'Empty', 'Empty', 'Empty'],
'1/3/23': ['Assignment C', 'Empty', 'Empty', 'Empty', 'Assignment E']}
df = pd.DataFrame(data1)
df
Name 1/1/23 1/2/23 1/3/23
0 Person 1 Assignment A Assignment B Assignment C
1 Person 1 Assignment D Empty Empty
2 Person 1 Empty Empty Empty
3 Person 2 Assignment E Empty Empty
4 Person 2 Empty Empty Assignment E
Code
melt & crosstab
pd.crosstab(df.melt('Name')['Name'], df.melt('Name')['value'])
output:
value Assignment A Assignment B Assignment C Assignment D Assignment E Empty
Name
Person 1 1 1 1 1 0 5
Person 2 0 0 0 0 2 4