I am trying to convert some Pandas code to Dask.
I have a dataframe that looks like the following:
ListView_Lead_MyUnreadLeads ListView_Lead_ViewCustom2
0 1 1
1 1 0
2 1 1
3 1 1
4 1 1
In Pandas, I can use create a Lists
column which includes the List
if the row value is 1
like so:
df['Lists'] = df.dot(df.columns+",").str.rstrip(",").str.split(",")
So the Lists
column looks like:
Lists
0 [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...
1 [ListView_Lead_MyUnreadLeads]
2 [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...
3 [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...
4 [ListView_Lead_MyUnreadLeads, ListView_Lead_Vi...
In Dask, the dot
function doesn't seem to work the same way. How can I get the same behavior / output?
Any help would be appreciated. Thanks!
Related question in Pandas: How to return headers of columns that match a criteria for every row in a pandas dataframe?
Here's some alternative ways to do it in Pandas. You can try whether it works equally well in Dask.
cols = df.columns.values
df['Lists'] = [list(cols[x]) for x in df.eq(1).values]
or try:
df['Lists'] = df.eq(1).apply(lambda x: list(x.index[x]), axis=1)
The first solution using list comprehension provides better performance if your dataset is large.
Result:
print(df)
ListView_Lead_MyUnreadLeads ListView_Lead_ViewCustom2 Lists
0 1 1 [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]
1 1 0 [ListView_Lead_MyUnreadLeads]
2 1 1 [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]
3 1 1 [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]
4 1 1 [ListView_Lead_MyUnreadLeads, ListView_Lead_ViewCustom2]