I have a Python dictionary with company information that is structured like this:
co_dict = {'0': {'co_name': 'A',
'company_type': 'Public',
'global_name': 'A PARENT',
'sales': '1000'},
'1': {'co_name': 'B',
'company_type': 'Public',
'global_name': 'B PARENT',
'sales': '1000'}}
And a Pandas dataframe that looks like this (the real df is much longer):
df = pd.DataFrame({'co-name': ['M','A','B','F'], 'co-number': [1,2,3,4]})
co-name co-number
0 M 1
1 A 2
2 B 3
3 F 4
I would like to find matches for each df["co-name"] in the dictionary and append the corresponding dictionary values for "global_name" and "sales" to a new column in the df. When there is no match, the columns should both read "n/a". So the end result would look like this:
co-name co-number global_name sales
0 M 1 n/a n/a
1 A 2 A PARENT 1000
2 B 3 B PARENT 2000
3 F 4 n/a n/a
I tried to do this as follows:
def find_global_name(x):
for key1 in co_dict.keys():
if (x['co-name'] == co_dict[key1]['co_name']):
return co_dict[key1]['global_name']
else:
return "n/a"
df['global_name'] = df.apply(find_global_name, axis=1)
But the for loop seems to stop after the first company "A" match. I'm stuck trying to figuring out why this. Why is this method not working? How do I accomplish this task? Many, many thanks for your help.
Create a dataframe from the dictionary and merge with df
:
(df.merge(pd.DataFrame(co_dict).T,
left_on = 'co-name',
right_on = 'co_name',
how = 'left')
.drop(columns=['co_name', 'company_type'])
)
co-name co-number global_name sales
0 M 1 NaN NaN
1 A 2 A PARENT 1000
2 B 3 B PARENT 1000
3 F 4 NaN NaN
Your code can be much clearer and faster if you trim the dictionary to only the values you need, and combine with map
which is quite fast ... this also allows you to avoid writing code that is probably not necessary.
Create dictionaries for the values you need:
global_name = {val['co_name']: val['global_name']
for _, val in co_dict.items()}
sales = {val['co_name']: val['sales']
for _, val in co_dict.items()}
global_name
{'A': 'A PARENT', 'B': 'B PARENT'}
sales
{'A': '1000', 'B': '1000'}
Map the dictionaries to the dataframe to create the new columns (the assumption here is that values in the column are unique; if not, this won't work and merge is a better option, as it can handle duplicates):
df.assign(global_name = df['co-name'].map(global_name),
sales = df['co-name'].map(sales))
Out[722]:
co-name co-number global_name sales
0 M 1 NaN NaN
1 A 2 A PARENT 1000
2 B 3 B PARENT 1000
3 F 4 NaN NaN