Search code examples
pythonpandasstack

Creating a stack from pandas column of lists


I'm having an issue building a stack from the following df:

import pandas as pd

df = pd.DataFrame(
    {
        "Student": ['Daphne','Scooby','Daphne','Shaggy','Fred','Daphne'],
        "window": [['Fred'],['Daphne'], [''], ['Daphne'],['Velma','Scrappy'],['Velma','Fred']]
    }
)

which gives you:

  Student            window
0  Daphne            [Fred]
1  Scooby          [Daphne]
2  Daphne                []
3  Shaggy          [Daphne]
4    Fred  [Velma, Scrappy]
5  Daphne     [Velma, Fred]

I wanted to create a stack table that would show how many times each student has seen the other students in the window. I wanted the end result to be like this:

        Daphne  Fred  Scrappy  Velma
Daphne       0     2        0      1
Fred         0     0        1      1
Scrappy      0     0        0      0 
Velma        0     0        0      0

I was thinking I could do something like this:

dfd = pd.get_dummies(df["window"].dropna().apply(pd.Series).stack())

which gives you:

     Daphne  Fred  Scrappy  Velma
1 0       1     0        0      0
2 0       0     0        0      0
3 0       1     0        0      0
4 0       0     0        0      1
  1       0     0        1      0
5 0       0     0        0      1
  1       0     1        0      0

and that is not going to give me the results I want once I implement:

xrf_df = dfd.T.dot(dfd)

which gives you

            Daphne  Fred  Scrappy  Velma
         1       0     0        0      0
Daphne   0       2     0        0      0
Fred     0       0     2        0      1
Scrappy  0       0     0        1      1
Velma    0       0     1        1      2

I was thinking of using this to remove the empty spaces:

dfd = dfd.drop(0)
del dfd['']

but even that won't give me the right numbers in the right columns.

Could someone tell me what I'm doing wrong?


Solution

  • one way of solving i this,

    df.explode('window').pivot_table(index='Student', columns='window', aggfunc='size', fill_value=0)
    

    O/P;

    window      Daphne  Fred  Scrappy  Velma
    Student                                 
    Daphne   1       0     2        0      1
    Fred     0       0     0        1      1
    Scooby   0       1     0        0      0
    Shaggy   0       1     0        0      0
    
    • Explode window column to remove list
    • perform pivot with column as window, index as student and count as aggrigate function and finally fill nan with 0 ​