Search code examples
pythonpandasstackedge-list

Stacking a number of columns into one column in python


I have a pandas dataframe of 100 rows x 7 columns like this:

enter image description here

Values in column source are connected to the values in the other columns. For example, a is connected to contact_1, contact_2... contact_5. In the same way, b is connected to contact_6, contact_7 .... and contact_10.

I want to stack these columns into two columns only (i.e. source and destination), to help me build a graph using edgelist format.

The expected output data format is:

enter image description here

I tried df.stack() but did not get the desired result, I got the following:

enter image description here

Any suggestions?


Solution

  • You're looking for pd.wide_to_long. This should do:

    pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')
    

    The column destination_ will have the info you're looking for.

    Example:

    import pandas as pd
    d = {'source': ['a', 'b'],
     'destination_1': ['contact_1', 'contact_6'],
     'destination_2': ['contact_2', 'contact_7']}
    df = pd.DataFrame(d)
    pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')
    

    Output:

                  destination_
    source number             
    a      1         contact_1
    b      1         contact_6
    a      2         contact_2
    b      2         contact_7