I have a large datarframe with 1739 rows and 1455 columns. I want to find the 150 lowest values for each row (Not the the 150 th value but 150 values).
I iterate over rows with a basic for loop.
I tried df.min(axis=1)
but it only gives out one min. And also the rolling_min
function without success.
Is there any existing function where i can enter the number of values i want to find witn .min?
My ultimate goal is to take the 150 lowest values and create a slope then calculate the area under the curve. Do this for each row and add the areas to obtain a volume.
Example of the the dataframe, I have a df that looks like this:
-218.7 -218.4 ... 217.2 217.5
0 56.632706 13.638315 ... 76.543000 76.543000
1 56.633455 13.576762 ... 76.543000 76.543000
2 -18.432203 -18.384091 ... 76.543000 76.543000
3 -18.476594 -18.439804 ... 76.543000 76.543000
The header is the '-218.7 ...' which are the coordinates in the x axis of a scan. The data is the height of the scan the y axis. What i need is the 150 lowest values for each rows and there associated column header as i want to make a curve for each row then calculate the area under the curve.
So i need for each line something like this :
-218.7 -218.4 ... for 150 columns
4 -18.532035 -18.497517 ... for 150 values
I don't think i need to store the header info for each line, a for loop would go trough each row one at a time.
Use .argsort
to get the indices of the underlying array sorted. Slice the values and the column Index to get all of the information you need. We'll create a MultiIndex so we can store both the column headers and values in the same DataFrame. The first level will be your nth lowest indicator.
import pandas as pd
import numpy as np
np.random.seed(1)
df = pd.DataFrame(np.random.randint(1,100000, (1739, 26)))
df.columns = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
N = 7 # 150 in your case
idx = np.argsort(df.values, 1)[:, 0:N]
pd.concat([pd.DataFrame(np.take_along_axis(df.to_numpy(), idx, axis=1), index=df.index),
pd.DataFrame(df.columns.to_numpy(), index=df.index)],
keys=['Value', 'Columns'], axis=1)
Value Columns
0 1 2 3 4 5 6 0 1 2 3 4 5 6
0 5193 7752 8445 19947 20610 21441 21759 C K U V I G P
1 432 3607 16278 17138 19434 26104 33879 R J W C B D G
2 16 1047 1845 9553 12314 13784 19432 K S E F M O U
3 244 5272 10836 13682 29237 33230 34448 K Q A S X W G
4 9765 11275 13160 22808 30870 33484 42760 K T L U C D M
5 2034 2179 4980 7184 14826 15238 22807 Z H F Q L R X
...