Search code examples
pythonpandasdataframereindex

How to reindex a pandas dataframe within a function?


I'm trying to add column headers with empty values to my dataframe (just like this answer), but within a function that is already modifying it, like so:

mydf = pd.DataFrame()

def myfunc(df):
  df['newcol1'] = np.nan  # this works

  list_of_newcols = ['newcol2', 'newcol3']
  df = df.reindex(columns=df.columns.tolist() + list_of_newcols)  # this does not
  return
myfunc(mydf)

If I run the lines individually in an IPython console, it will add them. But run as a script, newcol1 will be added but 2 and 3 will not. Setting copy=False does not work either. What am I doing wrong here?


Solution

  • Pandas df.reindex() produces a new object unless the indexes are equivalent, so you will need to return the new object from your function.

    def myfunc(df):
      df['newcol1'] = np.nan  # this works
    
      list_of_newcols = ['newcol2', 'newcol3']
      df = df.reindex(columns=df.columns.tolist + list_of_newcols)  # this does not
      return df
    
    mydf = myfunc(mydf)