Search code examples
pythonpandassortingapplydask

Build a combined column in dask dataframe for sorting


Sorting in Dask

Based on this answer I want to build the combined column dynamically

df_post['sort_column'] = df_post.apply(lambda r:str([r[col1],r[col2],r[col3]]), axis=1)
df_post = df_post.set_index('sort_column')
df_post = df_post.map_partitions(lambda x: x.sort_index())

I am not able to figure out a way to make this

[r[col1],r[col2],r[col3]]

dynamic based on a list of columns provided by config file.


Solution

  • It is tricky to tell what the question is after, but assuming it is "I would like to apply the solution in a the linked answer, but for a list of column names". This can look like

    df_post['sort_column'] = df_post.apply(lambda r:str([r[c] for c in columns]), axis=1)
    df_post = df_post.set_index('sort_column')
    df_post = df_post.map_partitions(lambda x: x.sort_index())
    

    where columns has been obtained from the config file beforehand.