Please help me with the updation of a file, based on values from another file.
The file I received is "todays_file1.csv" and has below table:
name day a_col b_col c_col
alex 22-05 rep 68 67
stacy 22-05 sme 79 81
penny 22-05 rep 74 77
gabbi 22-05 rep 59 61
And so, I need to update the values from only ['day', 'b_col', 'c_col'] into the second file "my_file.csv" which has too many other columns.
name day a_col a_foo b_col b_foo c_col
penny 21-May rep 2 69 31 69
alex 21-May rep 2 71 34 62
gabbi 21-May rep 1 62 32 66
stacy 21-May sme 3 73 38 78
The code I have so far is below:
df1 = pd.read_csv("todays_file1.csv")
df2 = pd.read_csv("my_file.csv")
df2.replace(to_replace=df2['day', 'b_col', 'c_col'], value= df1['day', 'b_col', 'c_col'], inplace=True)
Please help, with how to replace the 3 columns based on the 'name' column which is common in both, but may be jumbled.
I get the error below:
Traceback (most recent call last):
File "D:\TESTING\Trial.py", line 93, in <module>
df2.replace(to_replace=df2['day', 'b_col', 'c_col'], value= df1['day', 'b_col', 'c_col'], inplace=True)
File "C:\Winpy\WPy64-3770\python-3.7.7.amd64\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Winpy\WPy64-3770\python-3.7.7.amd64\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('day', 'b_col', 'c_col')
"anky" has provided the solution through the comments, and I am ever grateful.
The code below helps solve the problem.
df1 = pd.read_csv("todays_file1.csv")
df2 = pd.read_csv("my_file.csv")
df1.set_index('name')
df2.set_index('name')
df2.update(df1)
df2.to_csv("my_file.csv", index=False)
Thank you again Anky :)