My input data is this-
"Name" : ["Alice", "Bob", "Jake", "Jake", "Steve" , "Bob"] ,
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"],
"CityID" : ["1", "1", "2", "1", "1", "2"]
} )
Which when printed appears as this:
City Name CityID
0 Seattle Alice 1
1 Seattle Bob 1
2 Portland Jake 2
3 Seattle Jake 1
4 Seattle Steve 1
5 Portland Bob 2
I am trying to get something like this -
City CityID Alice Bob Jake Steve
0 Seattle 1 True True True True
1 Portland 2 False True True False
I am not sure how to get here. I did try the groupby function
df.groupby(['City','CityID', 'Name']).first()
which got me to this-
City CityID Name
Seattle 1 Alice
Bob
Jake
Portland 2 Bob
Jake
Steve
I am not sure how to proceed after this, any hints would be of great help.
Use get_dummies
with dtype=bool
parameter with unique values in MultiIndex
by max
:
df = (pd.get_dummies(df.set_index(['City','CityID'])['Name'], dtype=bool)
.max(level=[0,1])
.reset_index())
print (df)
City CityID Alice Bob Jake Steve
0 Seattle 1 True True True True
1 Portland 2 False True True False
If need groupby
solution create new column filled by True
with DataFrame.assign
, use GroupBy.first
, reshape by Series.unstack
and last convert MultiIndex
to columns by DataFrame.reset_index
, DataFrame.rename_axis
is for remove column name a
:
df = (df.assign(a=True)
.groupby(['City','CityID', 'Name'], sort=False)['a']
.first()
.unstack(fill_value=False)
.reset_index()
.rename_axis(None, axis=1))
print (df)
City CityID Alice Bob Jake Steve
0 Seattle 1 True True True True
1 Portland 2 False True True False