Search code examples
applypython-polars

How to apply value_counts() to multiple columns in polars python?


I am trying to apply a simple value_counts() to multiple columns on a dataframe in polars but getting error.

import polars as pl
import pandas as pd

data:

sample_df = pl.DataFrame({'sub-category': ['tv','mobile','tv','wm','micro','wm'],
              'category': ['electronics','mobile','electronics','electronics','kitchen','electronics']})

Failed Attempts:

#1
sample_df.apply(value_counts())

#2
sample_df.apply(lambda x: x.value_counts())

#3
sample_df.apply(lambda x: x.to_series().value_counts())

#4
sample_df.select(pl.col(['sub-category','category'])).apply(lambda x: x.value_counts())

#5
sample_df.select(pl.col(['sub-category','category'])).apply(lambda x: x.to_series().value_counts())

But if I convert it to Pandas dataframe then it works:

sample_df.to_pandas().apply(lambda x: x.value_counts())

Solution

  • You could reshape with .melt()

    shape: (12, 2)
    ┌──────────────┬─────────────┐
    │ variable     ┆ value       │
    │ ---          ┆ ---         │
    │ str          ┆ str         │
    ╞══════════════╪═════════════╡
    │ sub-category ┆ tv          │
    │ sub-category ┆ mobile      │
    │ sub-category ┆ tv          │
    │ sub-category ┆ wm          │
    │ sub-category ┆ micro       │
    │ …            ┆ …           │
    │ category     ┆ mobile      │
    │ category     ┆ electronics │
    │ category     ┆ electronics │
    │ category     ┆ kitchen     │
    │ category     ┆ electronics │
    └──────────────┴─────────────┘
    

    In which case it is then the length of each group:

    df.melt().group_by(pl.all()).len()
    
    shape: (7, 3)
    ┌──────────────┬─────────────┬─────┐
    │ variable     ┆ value       ┆ len │
    │ ---          ┆ ---         ┆ --- │
    │ str          ┆ str         ┆ u32 │
    ╞══════════════╪═════════════╪═════╡
    │ category     ┆ kitchen     ┆ 1   │
    │ sub-category ┆ tv          ┆ 2   │
    │ sub-category ┆ mobile      ┆ 1   │
    │ category     ┆ mobile      ┆ 1   │
    │ sub-category ┆ wm          ┆ 2   │
    │ sub-category ┆ micro       ┆ 1   │
    │ category     ┆ electronics ┆ 4   │
    └──────────────┴─────────────┴─────┘
    

    .pivot() can be used to reshape into individual columns if required.

    (df.melt()
       .pivot(
          index = "value",
          columns = "variable",
          values = "value",
          aggregate_function = pl.len()
       )
    )
    
    shape: (6, 3)
    ┌─────────────┬──────────────┬──────────┐
    │ value       ┆ sub-category ┆ category │
    │ ---         ┆ ---          ┆ ---      │
    │ str         ┆ u32          ┆ u32      │
    ╞═════════════╪══════════════╪══════════╡
    │ tv          ┆ 2            ┆ null     │
    │ mobile      ┆ 1            ┆ 1        │
    │ wm          ┆ 2            ┆ null     │
    │ micro       ┆ 1            ┆ null     │
    │ electronics ┆ null         ┆ 4        │
    │ kitchen     ┆ null         ┆ 1        │
    └─────────────┴──────────────┴──────────┘