Does polars have the function to encode string column into integers (1, 2, 3) like pandas.factorize?
Didn't find it in the polars documentation
Perhaps you're looking for a dense rank or the categorical type.
df = pl.DataFrame({"column": ["foo", "bar", "baz", "foo", "foo"]})
df.with_columns(rank = pl.col("column").rank("dense"))
shape: (5, 2)
┌────────┬──────┐
│ column | rank │
│ --- | --- │
│ str | u32 │
╞════════╪══════╡
│ foo | 3 │
│ bar | 1 │
│ baz | 2 │
│ foo | 3 │
│ foo | 3 │
└────────┴──────┘
AFAIK - "first seen" order is a little more involved.
(df.with_row_index("index")
.with_columns(rank = pl.col("index").first().over("column").rank("dense"))
)
shape: (5, 3)
┌───────┬────────┬──────┐
│ index ┆ column ┆ rank │
│ --- ┆ --- ┆ --- │
│ u32 ┆ str ┆ u32 │
╞═══════╪════════╪══════╡
│ 0 ┆ foo ┆ 1 │
│ 1 ┆ bar ┆ 2 │
│ 2 ┆ baz ┆ 3 │
│ 3 ┆ foo ┆ 1 │
│ 4 ┆ foo ┆ 1 │
└───────┴────────┴──────┘