I have a pandas Dataframe containing traders' positions over time, that I created like this:
history = pd.read_csv(r"history.csv")
history = DataFrame(history, columns=['Symbol', 'Size', 'Entry Price',
'Mark Price', 'PNL (ROE %)', 'Last Position Update'])
frames = [historylast, history]
history = pd.concat(frames)
positions = historylast['Symbol'].tolist()
historylast_symbol_set = set(positions)
where historylast is the last scraped database containing current positions, and history is the local copy with previous positions. This is the outcome:
history = history.sort_values('Symbol')
print (history)
Symbol Size ... PNL (ROE %) Last Position Update
0 BNBUSDT 250.800 ... 7702.095588 2021-05-01 03:12:09
5 BNBUSDT 1000.800 ... 43351.359565 2021-04-29 03:51:41
0 BTCUSDT 54.422 ... 513277.155788 2021-04-25 21:03:13
0 BTCUSDT 54.422 ... 328896.563684 2021-04-25 21:03:13
1 DOGEUSDT 2600000.000 ... 46896.408000 2021-05-01 08:24:51
This dataframe has been created by putting toghether traders' positions over time. What I would like to do is to see if the last available 'Size' for each coin has changed with respect to the previous one. For example, for BNBUSDT, the last size is 250, reduced by 75% with respect to the previous size which was 1000. For BTCUSDT the Size has not changed since the last time. While for DOGEUSDT there are no previous data to compare, so it s still 100% of bought position.
To achieve this I though I should split the dataframe into different dataframes, one for each symbol, and compute and save the percentage change with a for loop, but I have difficulties, and wonder if there is not a better way. Any help would be much appreciated
Considering the following df as example (will use the column names Symbol
and Size
as well)
import pandas as pd
d = {'Symbol': ["A", "C", "A", "B", "A", "B", "A"], 'Size': [1, 1, 2, 3, 4, 5, 4]}
df = pd.DataFrame(data=d)
print(df)
>>>> Symbol Size
0 A 1
1 C 1
2 A 2
3 B 3
4 A 4
5 B 5
6 A 4
To retrieve the last two rows for each Symbol, do the following
g = df.groupby('Symbol').head(2)
g = g.sort_values('Symbol').reset_index(drop=True)
print(g)
>>> Symbol Size
0 A 1
1 A 2
2 B 3
3 B 5
4 C 1
After that, in order to compute the difference between each Size for the respective group, assuming the values for that change are relevant, create a new column to display that difference with
g['Difference'] = g.groupby('Symbol').diff()
print(g)
>>> Symbol Size Difference
0 A 1 NaN
1 A 2 1.0
2 B 3 NaN
3 B 5 2.0
4 C 1 NaN
Note that the first element appears a NaN as it wasn't changed.