I have a pandas dataframe that includes a "Name" column. Strings in the Name column may contain "Joe", "Bob", or "Joe Bob". I want to add a column for the type of person: just Joe, just Bob, or Both.
I was able to do this by creating boolean columns, turning them into strings, combining the strings, and then replacing the values. It just...didn't feel very elegant! I am new to Python...is there a better way to do this?
My original dataframe:
df = pd.DataFrame(data= [['Joe Biden'],['Bobby Kennedy'],['Joe Bob Briggs']], columns = ['Name'])
0 | Name |
---|---|
1 | Joe Biden |
2 | Bobby Kennedy |
3 | Joe Bob Briggs |
I added two boolean columns to find names:
df['Joe'] = df.Name.str.contains('Joe')
df['Joe'] = df.Joe.astype('int')
df['Bob'] = df.Name.str.contains('Bob')
df['Bob'] = df.Bob.astype('int')
Now my dataframe looks like this:
df = pd.DataFrame(data= [['Joe Biden',1,0],['Bobby Kennedy',0,1],['Joe Bob Briggs',1,1]], columns = ['Name','Joe', 'Bob'])
0 | Name | Joe | Bob |
---|---|---|---|
1 | Joe Biden | 1 | 0 |
2 | Bobby Kennedy | 0 | 1 |
3 | Joe Bob Briggs | 1 | 1 |
But what I really want is one "Type" column with categorical values: Joe, Bob, or Both.
To do that, I added a column to combine the booleans, then I replaced the values:
df["Type"] = df["Joe"].astype(str) + df["Bob"].astype(str)
0 | Name | Joe | Bob | Type |
---|---|---|---|---|
1 | Joe Biden | 1 | 0 | 10 |
2 | Bobby Kennedy | 0 | 1 | 1 |
3 | Joe Bob Briggs | 1 | 1 | 11 |
df['Type'] = df.Type.astype('str') df['Type'].replace({'11': 'Both', '10': 'Joe','1': 'Bob'}, inplace=True)
0 | Name | Joe | Bob | Type |
---|---|---|---|---|
1 | Joe Biden | 1 | 0 | Joe |
2 | Bobby Kennedy | 0 | 1 | Bob |
3 | Joe Bob Briggs | 1 | 1 | Both |
This feels clunky. Anyone have a better way?
Thanks!
You can use np.select
to create the column Type
.
You need to ordered correctly your condlist
from the most precise to the widest.
df['Type'] = np.select([df['Name'].str.contains('Joe') & df['Name'].str.contains('Bob'),
df['Name'].str.contains('Joe'),
df['Name'].str.contains('Bob')],
choicelist=['Both', 'Joe', 'Bob'])
Output:
>>> df
Name Type
0 Joe Biden Joe
1 Bobby Kennedy Bob
2 Joe Bob Briggs Both