I have a dataframe with 1000 columns that looks like this
Item Value1 Value2 Value3 Value4 Value..n
A 12 24 14 50 43
B 14 27 13 48 47
C 10 32 17 89 49
D 17 30 15 23 40
I want to calculate the difference between the values based on the Item variable, something like this
Difference Value1 Value2 Value3 Value4 Value..n
A-B 2 3 1 2 4
A-C 2 ... ...... ...
A-D
B-C
B-D
C-D
and exclude the rows where if an item has more entries than the others. For example of item A has 100 entries and item B has 150, then the difference would be calculated of the first 100 entries and the next 50 entries of item B would be excluded from calculations.
We can do itertools.combinations
, then reindex
and create the dataframe
import itertools
idx = pd.MultiIndex.from_tuples([*itertools.combinations(df.Item,2)])
df = df.set_index('Item')
out = pd.DataFrame(df.reindex(idx.get_level_values(0)).values-df.reindex(idx.get_level_values(1)).values,
index=idx,
columns=df.columns)
out.index=out.index.map('-'.join)
out
Value1 Value2 Value3 Value4 Value
A-B -2 -3 1 2 -4
A-C 2 -8 -3 -39 -6
A-D -5 -6 -1 27 3
B-C 4 -5 -4 -41 -2
B-D -3 -3 -2 25 7
C-D -7 2 2 66 9