Search code examples
pythonpython-polars

Python Polars find the length of a string in a dataframe


I am trying to count the number of letters in a string in Polars. I could probably just use an apply method and get the len(Name). However, I was wondering if there is a polars specific method?

import polars as pl

df = pl.DataFrame({
    "start_date": ["2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
    "Name": ["John", "Joe", "James", "Jörg"]
})

In Pandas I can use .str.len()

>>> df.to_pandas()["Name"].str.len()
0    4
1    3
2    5
3    4
Name: Name, dtype: int64

But that does not exist in Polars:

df.with_columns(pl.col("Name").str.len())
# AttributeError: 'ExprStringNameSpace' object has no attribute 'len'

Solution

  • You can use

    df.with_columns(
        pl.col("Name").str.len_bytes().alias("bytes"),
        pl.col("Name").str.len_chars().alias("chars")
    )
    
    shape: (4, 4)
    ┌────────────┬───────┬───────┬───────┐
    │ start_date ┆ Name  ┆ bytes ┆ chars │
    │ ---        ┆ ---   ┆ ---   ┆ ---   │
    │ str        ┆ str   ┆ u32   ┆ u32   │
    ╞════════════╪═══════╪═══════╪═══════╡
    │ 2020-01-02 ┆ John  ┆ 4     ┆ 4     │
    │ 2020-01-03 ┆ Joe   ┆ 3     ┆ 3     │
    │ 2020-01-04 ┆ James ┆ 5     ┆ 5     │
    │ 2020-01-05 ┆ Jörg  ┆ 5     ┆ 4     │
    └────────────┴───────┴───────┴───────┘