I have this DF:
A | B | C | D | COLUMNS |
---|---|---|---|---|
1 | ABC | ABC | 2,3 | |
DEF | ABC | DEF | 2,4 | |
LSD | LSD | LSD | 2,4 | |
ERR | ERR | 2,3 | ||
ERR | ABC | LSD | 2,4 |
Expected Output
A-1 | B-2 | C-3 | D-4 | F-5 | COLUMNS INDEX | ABC | DEF | ERR | LSD |
---|---|---|---|---|---|---|---|---|---|
1 | ABC | ABC | 2,3 | 2 | 0 | 0 | 0 | ||
DEF | ABC | DEF | DEF | 2,4 | 1 | 2 | 0 | 0 | |
LSD | LSD | LSD | LSD | 2,3 | 0 | 0 | 0 | 2 | |
ERR | ERR | 2,3 | 0 | 0 | 2 | 0 | |||
ERR | ABC | LSD | LSD | 2,4 | 1 | 0 | 1 | 1 |
I HAVE CREATED THE COLUMN INDEX FOR SUMMING BUT I COULDN'T ABLE TO LOOP EACH COLUMNS INDED AND GET THE INTENDED OUTPUT.
stack
the relevant columns, then use a crosstab
and join
back to the original DataFrame:
s = df.drop(columns=['A', 'COLUMNS']).stack().droplevel(1)
out = df.join(pd.crosstab(s.index, s))
If you want to use the boundaries from the "COLUMNS" column, you can combine it with numpy to build a mask:
import numpy as np
# build the mask
limits = df['COLUMNS'].str.split(',', expand=True).astype(int)
a = np.arange(df.shape[1])
m1 = (a+1>=limits[0].to_numpy()[:, None])
m2 = (a<limits[1].to_numpy()[:, None])
# apply the mask and stack
s = df.where(m1 & m2).drop(columns=['A', 'COLUMNS']).stack().droplevel(1)
# count
out = df.join(pd.crosstab(s.index, s))
Output:
A B C D COLUMNS ABC DEF ERR LSD
0 1.0 ABC ABC NaN 2,3 2 0 0 0
1 NaN DEF ABC DEF 2,4 1 2 0 0
2 NaN LSD LSD LSD 2,4 0 0 0 3
3 NaN ERR ERR NaN 2,3 0 0 2 0
4 NaN ERR ABC LSD 2,4 1 0 1 1