I am comparing two dfs
using pandas compare()
updated_records = out['df1'].compare(out['df2'])
userid amount
self other self other
0 122 121 NaN NaN
2 NaN NaN 3.0 4.0
how to efficiently iterate over those rows to have output some thing like this (there are more columns than those two it is only example) i want to point which columns to show in report [userid, amount, but not others ]
:
First row:
UserId: 122 -> 121
Second row:
amount: 3.0 -> 4.0
I need output like for report:
"For item XXX_XXX those fields where updated:
UserId: 122 -> 121,
xyz: a -> b,
"
"For item XXX_YYY those fields where updated:
amount: 3.0 -> 4.0,
xyz: 45 -> ert
Basicly to to show changes
i was trying iterrows()
but df get pivoted to something like this
other 2024-06-06T11:23:35
SaveTS self 2024-06-06T10:36:53
other 2024-06-06T11:23:40
A self UP4
other UP5
B self 2025-02-07T00:00:00
other 2025-02-08T00:00:00
C self 7 FEB 2025
other 8 FEB 2025
D self 2025-02-07T00:00:00
other 2025-02-08T00:00:00
E self 7.0
other 8.0
and got stuck, as above i want to select A,B,C,E only to show in changes report
thanks
Example Code
It wasn't clear what exact output style you wanted, so I provided it as a new example.
import pandas as pd
data = {
('userid', 'self'): [122, None],
('userid', 'other'): [121, None],
('amount', 'self'): [None, 3.0],
('amount', 'other'): [2, 4.0]
}
df = pd.DataFrame(data, index=[0, 2])
df:
Code
for idx, row in df.iterrows():
print('For item {} those fields where updated:'.format(idx))
colums_lv0 = row[row.notna()].index.get_level_values(0).drop_duplicates()
for col in colums_lv0:
print('{} : {} -> {}'.format(col, row[(col, 'self')], row[(col, 'other')]))
print()
print:
For item 0 those fields where updated:
userid : 122.0 -> 121.0
amount : nan -> 2.0
For item 2 those fields where updated:
amount : 3.0 -> 4.0