Search code examples
pandasdataframeduplicates

Duplicate row and add string


I wish to duplicate Pandas data row and add string to end while keeping rest of data intact:

I_have = pd.DataFrame({'id':['a','b','c'], 'my_data' = [1,2,3])

I want:

Id     my_data
a      1
a_dup1 1
a_dup2 1
b      2
b_dup1 2
b_dup2 2
c      3
c_dup1 3
c_dup2 3

I could do this by 1) iterrows() or 2) 3x copies of existing data and appending, but hopefully there is more pythonic way to do this.

This seems to work:

tmp1 = I_have.copy(deep=True)
tmp2 = I_have.copy(deep=True)

tmp1['id'] = tmp1['id']+'_dup1'
tmp2['id'] = tmp2['id']+'_dup2'

pd.concat([I_have, tmp1, tmp2])

Solution

  • Use Index.repeat with DataFrame.loc for duplicated rows and then add counter by numpy.tile, last add substrings for duplicated values - not equal 0 in Series.mask:

    N = 3
    df = df.loc[df.index.repeat(N)].reset_index(drop=True)
    
    a = np.tile(np.arange(N), N)
    
    df['id'] = df['id'].mask(a != 0, df['id'] + '_dup' + a.astype(str))
    
    #alternative solution
    #df.loc[a != 0, 'id'] = df['id'] + '_dup' + a.astype(str)
    
    print (df)
           id  my_data
    0       a        1
    1  a_dup1        1
    2  a_dup2        1
    3       b        2
    4  b_dup1        2
    5  b_dup2        2
    6       c        3
    7  c_dup1        3
    8  c_dup2        3