Search code examples
pythonpandasdataframefillna

How can I loop though some columns and all rows, and if the value is nan, fill that value with values of other column?


I am new to Python. I have a dataframe with the following columns: State, City, Lat and Long. Some of the cities do not have a value for Lat neither Long, so I wanted to use the mean of the Lat and Long to fill those nan. I created two columns that show the mean of those two fields according to the State where the city is located.

grouped_State = df.groupby(["State"])
long_State = grouped_partido["Long"].mean()
lat_State = grouped_State["Lat"].mean()

data = df["State"],df["Lat"],df["Long"]
headers = ['State', "Lat_city","Long_city"]

df_x = pd.concat(data, axis=1, keys=headers)
df_x = pd.merge( left = df_x, right = long_partido , how = "left",
              left_on = "State", right_on = "State")
df_x = pd.merge( left = df_x, right = lat_partido , how = "left",
              left_on = "State", right_on = "State")

The result would be something like this:

Index  State  Lat_city  Long_city  Lat     Long
  0      A      -34       -56     -34.6    -56.1
  1      B      nan       nan     -33      -54.2
  2      A      nan       nan     -34.6    -56.1
  3      B      -35.3     -55.5   -33      -54.2

The output I am trying to get would be like this:

Index  State  Lat_city  Long_city  Lat     Long
  0      A      -34       -56     -34.6    -56.1
  1      B      -33      -54.2    -33      -54.2
  2      A      -34.6    -56.1    -34.6    -56.1
  3      B      -35.3     -55.5   -33      -54.2    

I have been trying with different kinds of loops and experimented with lambda functions, but nothing worked as expected.


Solution

  • According to the data frame documentation, located at (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) .fillna excepts a series as well. So if you were to do -

    df['Lat_city'] = df['Lat_city'].fillna(df['Lat'])
    df['Long_city'] = df['Long_city'].fillna(df['Long'])
    

    You would get the expected output -

     Index  State  Lat_city  Long_city  Lat     Long
      0      A      -34       -56     -34.6    -56.1
      1      B      -33      -54.2    -33      -54.2
      2      A      -34.6    -56.1    -34.6    -56.1
      3      B      -35.3     -55.5   -33      -54.2