I have a dataframe as follows,
import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})
I would like to repeat the verbs in text column if the corresponding 'pos' is 'VERB'. so I did the following so far,
df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)
and then I tried to repeat the specific rows in this manner,
print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))
but it does not return anything. My desired output would be,
new_df
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play
If the index is not important you can use:
df2 = (df.assign(text=df['text'].str.split())
.explode(['text', 'pos'], ignore_index=True)
)
df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
.sort_index().reset_index(drop=True)
)
alternative using repeat
(and df2
from above):
df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
.reset_index(drop=True)
)
output:
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play