Search code examples
pythonpandasdataframerow

I need to add specific rows in pandas DataFrame, at specific position


I'm currently working on a project and I need to add specific rows whenever the tagged sentence ends. Whenever the 'N' column equals 1 it means that a new sentence started. I want to add two rows for each sentence: a row with 'Pos'= START at the beginning of the sentence, and a row with 'Pos'=End at the end of each row. This is what the DataFrame look like:

POSTAG = {
        'N': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5,6,7,8,9],
        'Name': ['ἐρᾷ','μὲν','ἁγνὸς','οὐρανὸς','τρῶσαι','χθόνα',',','ἔρως','δὲ','γαῖαν','λαμβάνει','γάμου','τυχεῖν','.','ὄμβρος','δ̓','ἀπ̓','εὐνάοντος','οὐρανοῦ','πεσὼν','ἔκυσε','γαῖαν','.','ἡ','δὲ','τίκτεται','βροτοῖς','μήλων','τε','βοσκὰς','καὶ','βίον','Δημήτριον','.','δενδρῶτις','ὥρα','δ̓','ἐκ','νοτίζοντος','γάμου','τέλειος','ἐστί','.'],
        'Pos': ['VERB','ADV','ADJ','NOUN','VERB','NOUN','PUNCT','NOUN','CCONJ','NOUN','VERB','NOUN','VERB','PUNCT','NOUN','ADV','ADP','ADJ','NOUN','VERB','VERB','NOUN','PUNCT','DET','ADV','VERB','NOUN','NOUN','ADV','NOUN','CCONJ','NOUN','ADJ','PUNCT','NOUN','NOUN','ADV','ADP','VERB','NOUN','ADJ','VERB','PUNCT']
        }

df = pd.DataFrame(POSTAG, columns = ['N', 'Name','Pos'])
print (df)

In this case I need a [Nan, Nan, START] tag at indexes 0 and 15. and a [Nan,Nan, END] tag at index 14. I need to make it for all my df. How could I do this?


Solution

  • Analyzing your dataframe, I just assume you want to insert START before value 1 in column N and insert END after the max continuous value in column N. If so, you could do following

    First create two dummy dataframe start_df and end_df

    start_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['->START']})
    end_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['END<-']})
    

    Then split the dataframe with continuous value in column N

    mask = ~df['N'].diff().fillna(0).eq(1)
    
    gb = df.groupby(mask.cumsum())
    groups = [gb.get_group(x) for x in gb.groups]
    

    Moreover, insert dummy dataframe before and after each group

    res = []
    
    for group in groups:
        res.append(start_df)
        res.append(group)
        res.append(end_df)
    

    At last, create dataframe by concating dataframe in list

    df_ = pd.concat(res).reset_index(drop=True)
    
    # print(df_)
    
           N        Name      Pos
    0    NaN         NaN  ->START
    1    1.0         ἐρᾷ     VERB
    2    2.0         μὲν      ADV
    3    3.0       ἁγνὸς      ADJ
    4    4.0     οὐρανὸς     NOUN
    5    5.0      τρῶσαι     VERB
    6    6.0       χθόνα     NOUN
    7    7.0           ,    PUNCT
    8    8.0        ἔρως     NOUN
    9    9.0          δὲ    CCONJ
    10  10.0       γαῖαν     NOUN
    11  11.0    λαμβάνει     VERB
    12  12.0       γάμου     NOUN
    13  13.0      τυχεῖν     VERB
    14  14.0           .    PUNCT
    15   NaN         NaN    END<-
    16   NaN         NaN  ->START
    17   1.0      ὄμβρος     NOUN
    18   2.0          δ̓      ADV
    19   3.0         ἀπ̓      ADP
    20   4.0   εὐνάοντος      ADJ
    21   5.0     οὐρανοῦ     NOUN
    22   6.0       πεσὼν     VERB
    23   7.0       ἔκυσε     VERB
    24   8.0       γαῖαν     NOUN
    25   9.0           .    PUNCT
    26   NaN         NaN    END<-
    27   NaN         NaN  ->START
    28   1.0           ἡ      DET
    29   2.0          δὲ      ADV
    30   3.0    τίκτεται     VERB
    31   4.0     βροτοῖς     NOUN
    32   5.0       μήλων     NOUN
    33   6.0          τε      ADV
    34   7.0      βοσκὰς     NOUN
    35   8.0         καὶ    CCONJ
    36   9.0        βίον     NOUN
    37  10.0   Δημήτριον      ADJ
    38  11.0           .    PUNCT
    39   NaN         NaN    END<-
    40   NaN         NaN  ->START
    41   1.0   δενδρῶτις     NOUN
    42   2.0         ὥρα     NOUN
    43   3.0          δ̓      ADV
    44   4.0          ἐκ      ADP
    45   5.0  νοτίζοντος     VERB
    46   6.0       γάμου     NOUN
    47   7.0     τέλειος      ADJ
    48   8.0        ἐστί     VERB
    49   9.0           .    PUNCT
    50   NaN         NaN    END<-