DataFrame df:
no
column has a single int value 1-28 in increasing order, until the next group starts (dataframe is presorted). Some values are missing (e.g 1,2,3,5,7,1,2,3,28,1 ...)intensity
column has a single float value 0.0-1.0.target
has a single numpy array with indexes 0-27, which should contain all intensities of the group. Intensity with no 1 should be at 0th index of this numpy array, intensity with no 2 at 1st and so on. This array is filled with 0.0 at firstindex | no | intensity | target |
---|---|---|---|
0 | 2 | 0.027350 | np.arange(0,27) |
1 | 4 | 0.074639 | np.arange(0,27) |
2 | 5 | 0.056452 | np.arange(0,27) |
3 | 2 | 0.011223 | np.arange(0,27) |
The result should be:
Let's say we look at row x: in this row numpy array of df['target']
column should be rewritten so that at it's position (df['no']-1)[x]
0.0 value should be rewritten as df['intensity'][x]
value,
e.g. if df['no'][0] = 2
and df['intensity'][0] = 0.027350
then
df['target'][0] = [0.0, 0.027350, 0.0 ... ]
If there's no such no
value in this group intensity should stay 0.0
target
column from example table above should look like this:
array([[0.0, 0.027350, 0.0, 0.074639, 0.056452, 0.0(at indexes 5-27)],[0.0, 0.011223, ...]])
I got this result using for loop
for i in df.index:
df['target'][i][(df['no'][i])-1] = df['intensity'][i]
if i >= df.index.max():
break
if df['no'][i] < df['no'][i+1]:
df['target'][i+1] = df['target'][i] + df['target'][i+1]
df.drop(i, inplace=True)
It's too slow though, because I have millions of rows. Is there a way to use .apply() function or anything else to speed up the process?
You do not need an extra intensity column and modify row by row. You could do one operation for each group:
df['group'] = (df['no'].shift(1) > df['no']).cumsum()
def map_intensity(arr):
new_arr = np.zeros(27)
new_arr[arr['no'].values] = arr['intensity'].values
return new_arr
df.groupby('group').apply(map_intensity)