Search code examples
python-3.xpandasreplacekeyerror

Pandas File Update/Replace values from another reference file


Please help me with the updation of a file, based on values from another file.

The file I received is "todays_file1.csv" and has below table:

name    day a_col   b_col   c_col
alex    22-05   rep 68  67
stacy   22-05   sme 79  81
penny   22-05   rep 74  77
gabbi   22-05   rep 59  61

And so, I need to update the values from only ['day', 'b_col', 'c_col'] into the second file "my_file.csv" which has too many other columns.

name    day a_col   a_foo   b_col   b_foo   c_col
penny   21-May  rep 2   69  31  69
alex    21-May  rep 2   71  34  62
gabbi   21-May  rep 1   62  32  66
stacy   21-May  sme 3   73  38  78

The code I have so far is below:

df1 = pd.read_csv("todays_file1.csv")
df2 = pd.read_csv("my_file.csv")
df2.replace(to_replace=df2['day', 'b_col', 'c_col'], value= df1['day', 'b_col', 'c_col'], inplace=True)

Please help, with how to replace the 3 columns based on the 'name' column which is common in both, but may be jumbled.

I get the error below:

Traceback (most recent call last):
  File "D:\TESTING\Trial.py", line 93, in <module>
    df2.replace(to_replace=df2['day', 'b_col', 'c_col'], value= df1['day', 'b_col', 'c_col'], inplace=True)
  File "C:\Winpy\WPy64-3770\python-3.7.7.amd64\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Winpy\WPy64-3770\python-3.7.7.amd64\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('day', 'b_col', 'c_col')

Solution

  • "anky" has provided the solution through the comments, and I am ever grateful.

    The code below helps solve the problem.

    df1 = pd.read_csv("todays_file1.csv")
    df2 = pd.read_csv("my_file.csv")
    df1.set_index('name')
    df2.set_index('name')
    df2.update(df1)
    df2.to_csv("my_file.csv", index=False)
    

    Thank you again Anky :)