I am new to Python. I have a dataframe with the following columns: State, City, Lat and Long. Some of the cities do not have a value for Lat neither Long, so I wanted to use the mean of the Lat and Long to fill those nan. I created two columns that show the mean of those two fields according to the State where the city is located.
grouped_State = df.groupby(["State"])
long_State = grouped_partido["Long"].mean()
lat_State = grouped_State["Lat"].mean()
data = df["State"],df["Lat"],df["Long"]
headers = ['State', "Lat_city","Long_city"]
df_x = pd.concat(data, axis=1, keys=headers)
df_x = pd.merge( left = df_x, right = long_partido , how = "left",
left_on = "State", right_on = "State")
df_x = pd.merge( left = df_x, right = lat_partido , how = "left",
left_on = "State", right_on = "State")
The result would be something like this:
Index State Lat_city Long_city Lat Long
0 A -34 -56 -34.6 -56.1
1 B nan nan -33 -54.2
2 A nan nan -34.6 -56.1
3 B -35.3 -55.5 -33 -54.2
The output I am trying to get would be like this:
Index State Lat_city Long_city Lat Long
0 A -34 -56 -34.6 -56.1
1 B -33 -54.2 -33 -54.2
2 A -34.6 -56.1 -34.6 -56.1
3 B -35.3 -55.5 -33 -54.2
I have been trying with different kinds of loops and experimented with lambda functions, but nothing worked as expected.
According to the data frame documentation, located at (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) .fillna excepts a series as well. So if you were to do -
df['Lat_city'] = df['Lat_city'].fillna(df['Lat'])
df['Long_city'] = df['Long_city'].fillna(df['Long'])
You would get the expected output -
Index State Lat_city Long_city Lat Long
0 A -34 -56 -34.6 -56.1
1 B -33 -54.2 -33 -54.2
2 A -34.6 -56.1 -34.6 -56.1
3 B -35.3 -55.5 -33 -54.2