Search code examples
pythonpandasdataframerepeat

Copy dataframe n times, assign new IDs, keeping the original


I have a dataframe that looks like this:

df = pd.DataFrame({'id':[1,3,500, 53, 1, 500], 'code1':['a0', 'b0', 'b0', 'c0', 'b0', 'a0'], 'code2':['aa', 'bb', 'cc', 'bb', 'cc', 'bb'], 'date':['2022-10-01', '2022-09-01', '2022-10-01', '2022-11-01', '2022-09-01', '2022-11-01']})

I want to expand (copy) this dataframe N times, but each time has to have a different IDs, keeping the original ID and the original combination (e.g., id=1 has code1=[a0,b0], code2=[aa, cc], date=['2022-10-01', 2022-08-01'], the new id replacing id=1 should have the same).

For N=1, I can do:

df1 = df.loc[df.index.repeat(1)]  # repeat 1 time my dataframe, I can also just copy

ids = df.id.unique() # Get the original IDs
new_ids = df.id.max()+1  # Create new IDs starting from the max ID
nids = df.id.nunique() # Get the number of unique IDs
new_ids = new_ids + range(0,nids) # Create a list with the new IDs

df1['id'] = df1['id'].replace(ids, new_ids) # Replace the old IDs with the new ones

df_final = pd.concat(df, df1, axis=0)  # Concacatenate

For N=2 and larger, I thought of doing a for loop, but I guess there is a better way? Important thing is to keep the combinations for each IDs with code1/code2/date and keep the original IDs.

Thank you!


Solution

  • You can use the key parameter of concat to increment a step based on the max id in the original DataFrame:

    N = 4
    
    step = df['id'].max()
    out = pd.concat([df]*N, keys=range(N))
    out['id'] += out.index.get_level_values(0)*step
    out = out.droplevel(0)
    

    More simple variant with :

    import numpy as np
    
    N = 4
    
    step = df['id'].max()
    a = np.repeat(np.arange(N), len(df))
    out = pd.concat([df]*N)
    out['id'] += a*step
    

    output:

         id code1 code2        date
    0     1    a0    aa  2022-10-01
    1     3    b0    bb  2022-09-01
    2   500    b0    cc  2022-10-01
    3    53    c0    bb  2022-11-01
    4     1    b0    cc  2022-09-01
    5   500    a0    bb  2022-11-01
    0   501    a0    aa  2022-10-01 # new id starts at 501
    1   503    b0    bb  2022-09-01
    2  1000    b0    cc  2022-10-01
    3   553    c0    bb  2022-11-01
    4   501    b0    cc  2022-09-01
    5  1000    a0    bb  2022-11-01
    0  1001    a0    aa  2022-10-01 # new id starts at 1001
    1  1003    b0    bb  2022-09-01
    2  1500    b0    cc  2022-10-01
    3  1053    c0    bb  2022-11-01
    4  1001    b0    cc  2022-09-01
    5  1500    a0    bb  2022-11-01
    0  1501    a0    aa  2022-10-01 # new id starts at 1501
    1  1503    b0    bb  2022-09-01
    2  2000    b0    cc  2022-10-01
    3  1553    c0    bb  2022-11-01
    4  1501    b0    cc  2022-09-01
    5  2000    a0    bb  2022-11-01