Search code examples
pythonpandasdask

How to generate array column with values from other columns using Dask Dataframe


I am trying to convert some Pandas code to Dask.

I have a dataframe that looks like the following:

   ListView_Lead_MyUnreadLeads  ListView_Lead_ViewCustom2 
0                            1                          1   
1                            1                          0   
2                            1                          1   
3                            1                          1   
4                            1                          1   

In Pandas, I can use create a Lists column which includes the List if the row value is 1 like so:

df['Lists'] = df.dot(df.columns+",").str.rstrip(",").str.split(",")

So the Lists column looks like:

                                               Lists
0  [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...
1                      [ListView_Lead_MyUnreadLeads]
2  [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...
3  [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...
4  [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...

In Dask, the dot function doesn't seem to work the same way. How can I get the same behavior / output?

Any help would be appreciated. Thanks!

Related question in Pandas: How to return headers of columns that match a criteria for every row in a pandas dataframe?


Solution

  • Here's some alternative ways to do it in Pandas. You can try whether it works equally well in Dask.

    cols = df.columns.values
    df['Lists'] = [list(cols[x]) for x in df.eq(1).values]
    

    or try:

    df['Lists'] = df.eq(1).apply(lambda x: list(x.index[x]), axis=1)
    

    The first solution using list comprehension provides better performance if your dataset is large.

    Result:

    print(df)
    
       ListView_Lead_MyUnreadLeads  ListView_Lead_ViewCustom2                                                     Lists
    0                            1                          1  [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]
    1                            1                          0                             [ListView_Lead_MyUnreadLeads]
    2                            1                          1  [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]
    3                            1                          1  [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]
    4                            1                          1  [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]