This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': [100, 100, 105, 106, 106, 107, 108, 109],
'b': [99, 100, 110, 107, 100, 110, 120, 106],
}
)
Expected output is creating column x
:
a b x
0 100 99 0
1 100 100 1
2 105 110 2
3 106 107 3
4 106 100 1
5 107 110 4
6 108 120 5
7 109 106 3
Logic:
This is somehow an extension to this answer. I explain the logic by examples and I start from row 1
:
For row 1
, the b
column value is 100. Then in order to get x
for row 1
this value should be compared with all UNIQUE values in a
that are on the same row or before it to find out how many values in a
are less than or equal to it. The only unique value that is on the same row or before it is 100, so 1 is chosen for x
.
For row 2
, there are two unique values in a
that are less than or equal to 110 which are 100, 105.
The logic is the same for the rest of rows.
This is my attempt based on the linked answer but it does not work:
t = df.a.unique()
m1 = np.arange(len(t))[:,None] >= np.arange(len(t))
h = df['b'].to_numpy()
m2 = t <= h[:, None]
IIUC, use numpy broadcasting, tril
, and a mask
to only keep the unique values:
a = df['a'].mask(df['a'].duplicated()).to_numpy()
b = df['b'].to_numpy()
df['x'] = np.tril(b[:, None]>=a).sum(axis=1)
As a one-liner:
df['x'] = np.tril(df[['b']].to_numpy() >=
df['a'].mask(df['a'].duplicated()).to_numpy()).sum(axis=1)
Output:
a b x
0 100 99 0
1 100 100 1
2 105 110 2
3 106 107 3
4 106 100 1
5 107 110 4
6 108 120 5
7 109 106 3
Intermediates:
# a
array([100., nan, 105., 106., nan, 107., 108., 109.])
# b
array([ 99, 100, 110, 107, 100, 110, 120, 106])
# np.tril(b[:, None]>=a)
array([[False, False, False, False, False, False, False, False],
[ True, False, False, False, False, False, False, False],
[ True, False, True, False, False, False, False, False],
[ True, False, True, True, False, False, False, False],
[ True, False, False, False, False, False, False, False],
[ True, False, True, True, False, True, False, False],
[ True, False, True, True, False, True, True, False],
[ True, False, True, True, False, False, False, False]])