Search code examples
pythonpandassortinggroup-bymultiple-columns

How to find common edges from a binary dataframe?


This is my dataset:

Dept Cell culture Bioinfo Immunology Trigonometry Algebra Biotech Optics
Biotech 1 1 1 0 0 1 0
Math 0 0 0 1 1 0 0
Physics 0 0 0 0 0 0 1

How I want my result:

Dept 0
Biotech Cell culture
Biotech Bioinfo
Biotech Immunology
Math Trigonometry
Math Algebra
Physics Optics

I need to form pairs that have the value one, but I also need to rid of those values which are the same in both column and row index - such as biotech here. Is there an easy way to do this?


Solution

  • Try this:

    #df = df.set_index('Dept') if needed move dept into the index
    df.dot(df.columns+',').str.strip(',').str.split(',').explode().reset_index()
    

    Output:

          Dept             0
    0  Biotech  Cell culture
    1  Biotech       Bioinfo
    2  Biotech    Immunology
    3  Biotech       Biotech
    4     Math  Trigonometry
    5     Math       Algebra
    6  Physics        Optics