I have a pandas dataframe with 2 columns "type" and "sign" as follows
type sign
0 open A
1 open B
2 open D
3 close B
4 close D
5 open B
6 close B
7 close A
"A" + "open" means that event A has started happening. "A" + "close" means that event A has stopped. I need to vectorize it, when time is a major issue (the list is actually about 40 million lines long). Kind of like one-hot-encoding, but i need "1" to exist if and only if an event is "active". for example, for this case the results should be:
A B C D type sign
0 1 0 0 0 open A
1 1 1 0 0 open B
2 1 1 0 1 open D
3 1 0 0 1 close B
4 1 0 0 0 close D
5 1 1 0 0 open B
6 1 0 0 0 close B
7 0 0 0 0 close A
Ideas? thanks
For your data, you can just do a for
loop:
for sign in 'ABCD':
df[sign] = (df['type'].where(df['sign'].eq(sign))
.ffill()
.eq('open')
.astype(int)
)
Output:
type sign A B C D
0 open A 1 0 0 0
1 open B 1 1 0 0
2 open D 1 1 0 1
3 close B 1 0 0 1
4 close D 1 0 0 0
5 open B 1 1 0 0
6 close B 1 0 0 0
7 close A 0 0 0 0