Search code examples
pythonpandasdataframedatatablenested-loops

Duplicate pairs of rows side by side in pandas given certain condition


I have the following code:

import pandas as pd

data = {
    'Col1': ['John 1', 'John 2', 'John 3', 'Kyle 1', 'Kyle 3', 'Kyle 2'],
    'Col2': ['B', 'C', 'E', 'F', 'F', 'S'],
    'Col3': ['1', '1', '1', '1', '1', '2']
}

df = pd.DataFrame(data)

print(df)

Which gives this dataframe in which the numbers in Col1 (column 1) repeat only twice (e.g. John 1 and Kyle 1):

     Col1 Col2 Col3
0  John 1    B    1
1  John 2    C    1
2  John 3    E    1
3  Kyle 1    F    1
4  Kyle 3    F    1
5  Kyle 2    S    2

I want to make a function that iterates through the names of Col1 and duplicates the row with same number to be beside the original row. My desired output should be something like this:

     Col1 Col2 Col3 New Col1 New Col2 New Col3
0  John 1    B    1  Kyle 1        F        1
1  John 2    C    1  Kyle 2        S        2
2  John 3    E    1  Kyle 3        F        1
3  Kyle 1    F    1  John 1        B        1
4  Kyle 3    F    1  John 3        E        1
5  Kyle 2    S    2  John 2        C        1

Notice how all the values in the original Kyle 1 (row 3) have been duplicated since the '1' matches with 'John 1' in row 0. I have tried using pd.concat within nested for loops but I can't get it to work.


Solution

  • Make self-merge on the numeric suffix of Col1 and filter out duplicate entries.
    The column names are arranged with df.add_suffix:

    res = (df.merge(df.add_suffix('_new'), 
                    on=df['Col1'].str.replace(r'\D+', '', regex=True), how='left')
           .drop('key_0', axis=1))
    res = res[res['Col1'] != res['Col1_new']]
    print(res)
    

          Col1 Col2 Col3 Col1_new Col2_new Col3_new
    1   John 1    B    1   Kyle 1        F        1
    3   John 2    C    1   Kyle 2        S        2
    5   John 3    E    1   Kyle 3        F        1
    6   Kyle 1    F    1   John 1        B        1
    8   Kyle 3    F    1   John 3        E        1
    10  Kyle 2    S    2   John 2        C        1