I have a dataset containing almost 400 counties from Poland. I need to merge several files containing data related to these counties. Each county is identified by its name and a code. However, there's a challenge: there are 10 pairs of counties with the same name. For example, there is a county called 'powiat brzeski' in both the Małopolskie and Opolskie provinces. To differentiate between these counties, I want to add the name of the province next to the county name using Python. I attempted to do this with the below method, but encountered an issue: after the manipulation, the values in the 'County' column, except for those affected by the manipulation, turned into NaN. Could you assist me with resolving this issue?
import pandas as pd
# Sample DataFrame
data = {'Code': [1202000, 2402000, 802000, 3017000, 3005000, 9999999], # Added an unmatched code 9999999
'County': ['Powiat brzeski', 'Powiat bielski', 'Powiat krośnieński', 'Powiat ostrowski', 'Powiat grodziski', 'Powiat ciechanowski']} # Added an unmatched county
df = pd.DataFrame(data)
# Print updated DataFrame
print(df)
Result:
Code County
0 1202000 Powiat brzeski
1 2402000 Powiat bielski
2 802000 Powiat krośnieński
3 3017000 Powiat ostrowski
4 3005000 Powiat grodziski
5 9999999 Powiat ciechanowski
data = {'Code': [1202000, 2402000, 802000, 3017000, 3005000, 9999999], # Added an unmatched code 9999999
'County': ['Powiat brzeski', 'Powiat bielski', 'Powiat krośnieński', 'Powiat ostrowski', 'Powiat grodziski', 'Powiat ciechanowski']} # Added an unmatched county
df = pd.DataFrame(data)
# Dictionary mapping codes to county names
code_to_county = {
1202000: "Powiat brzeski_Malopolskie",
2402000: "Powiat bielski_Slaskie",
802000: "Powiat krośnieński_Lubuskie",
3017000: "Powiat ostrowski_Wielkopolskie",
3005000: "Powiat grodziski_Wielkopolskie"
}
# Update values in "County" column based on values in "Code" column
df['County'] = df['Code'].map(code_to_county)
# Print updated DataFrame
print(df)
Result:
Code County
0 1202000 Powiat brzeski_Malopolskie
1 2402000 Powiat bielski_Slaskie
2 802000 Powiat krośnieński_Lubuskie
3 3017000 Powiat ostrowski_Wielkopolskie
4 3005000 Powiat grodziski_Wielkopolskie
5 9999999 NaN
You can use combine_first
:
In [79]: df['County'] = df['Code'].map(code_to_county).combine_first(df['County'])
In [80]: df
Out[80]:
Code County
0 1202000 Powiat brzeski_Malopolskie
1 2402000 Powiat bielski_Slaskie
2 802000 Powiat krośnieński_Lubuskie
3 3017000 Powiat ostrowski_Wielkopolskie
4 3005000 Powiat grodziski_Wielkopolskie
5 9999999 Powiat ciechanowski