Search code examples
pythonpandasdataframerowrepeat

pandas: repeat a row if a column contains certain value


I have a dataframe as follows,

import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})

I would like to repeat the verbs in text column if the corresponding 'pos' is 'VERB'. so I did the following so far,

df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)

and then I tried to repeat the specific rows in this manner,

print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))

but it does not return anything. My desired output would be,

    new_df 
       text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10       go   VERB    play
11       go   VERB    play
12      out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play

Solution

  • If the index is not important you can use:

    df2 = (df.assign(text=df['text'].str.split())
             .explode(['text', 'pos'], ignore_index=True)
          )
    
    df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
                .sort_index().reset_index(drop=True)
              )
    

    alternative using repeat (and df2 from above):

    df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
                 .reset_index(drop=True)
              )
    

    output:

          text    pos    info
    0        I   PRON  school
    1       go   VERB  school
    2       go   VERB  school
    3       to    ADP  school
    4   school   NOUN  school
    5     open   VERB    door
    6     open   VERB    door
    7      the    DET    door
    8    green    ADJ    door
    9     door   NOUN    door
    10      go   VERB    play
    11      go   VERB    play
    12     out    ADP    play
    13     and  CCONJ    play
    14    play   VERB    play
    15    play   VERB    play