How count occurrences across all columns in a polars dataframe?

I have an imported .csv of number values- I want to sort the dataframe so I end up a list showing how many times each value occurs across the whole dataframe.

E.g.

I have found how to count the values of a specified column, but I can't find a way to do the same thing for the entire dataframe- I could count the values of each column and then combine them afterward but it is a bit clunky and I was looking for a more elegant solution.

This is an example of what I was trying:

sort_dataframe = df.select(pl.col("1", "2", "3", "4", "5", "6", "7").value_counts())

Which results in:

polars.exceptions.ComputeError: Series length 16 doesn't match the DataFrame height of 26

Solution

TLDR. You can use value_counts after unpivoting the dataframe into a long format.

df.unpivot().get_column("value").value_counts()

Explanation

Let us consider the following example dataframe.

import polars as pl

df = pl.DataFrame({
    "col_1": [1, 2, 3],
    "col_2": [2, 3, 7],
    "col_3": [1, 1, 9],
})

shape: (3, 3)
┌───────┬───────┬───────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ ---   ┆ ---   ┆ ---   │
│ i64   ┆ i64   ┆ i64   │
╞═══════╪═══════╪═══════╡
│ 1     ┆ 2     ┆ 1     │
│ 2     ┆ 3     ┆ 1     │
│ 3     ┆ 7     ┆ 9     │
└───────┴───────┴───────┘

First, we can unpivot the dataframe using pl.DataFrame.unpivot to obtain a single column containing all values.

df.unpivot()

shape: (9, 2)
┌──────────┬───────┐
│ variable ┆ value │
│ ---      ┆ ---   │
│ str      ┆ i64   │
╞══════════╪═══════╡
│ col_1    ┆ 1     │
│ col_1    ┆ 2     │
│ col_1    ┆ 3     │
│ col_2    ┆ 2     │
│ col_2    ┆ 3     │
│ col_2    ┆ 7     │
│ col_3    ┆ 1     │
│ col_3    ┆ 1     │
│ col_3    ┆ 9     │
└──────────┴───────┘

Finally, we can get the value column as a pl.Series and use pl.Series.value_counts to count the number of occurrences of each value.

counts = df.melt().get_column("value").value_counts()

shape: (5, 2)
┌───────┬───────┐
│ value ┆ count │
│ ---   ┆ ---   │
│ i64   ┆ u32   │
╞═══════╪═══════╡
│ 7     ┆ 1     │
│ 3     ┆ 2     │
│ 9     ┆ 1     │
│ 2     ┆ 2     │
│ 1     ┆ 3     │
└───────┴───────┘

This can also be simply converted to a python dictionary.

dict(counts.iter_rows())

{3: 2, 7: 1, 1: 3, 2: 2, 9: 1}

How count occurrences *across all columns* in a polars dataframe?

Explanation

How count occurrences across all columns in a polars dataframe?