I am trying to count the number of letters in a string in Polars.
I could probably just use an apply method and get the len(Name)
.
However, I was wondering if there is a polars specific method?
import polars as pl
df = pl.DataFrame({
"start_date": ["2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
"Name": ["John", "Joe", "James", "Jörg"]
})
In Pandas I can use .str.len()
>>> df.to_pandas()["Name"].str.len()
0 4
1 3
2 5
3 4
Name: Name, dtype: int64
But that does not exist in Polars:
df.with_columns(pl.col("Name").str.len())
# AttributeError: 'ExprStringNameSpace' object has no attribute 'len'
You can use
.str.len_bytes()
that counts number of bytes in the UTF8 string.str.len_chars()
that counts number of charactersdf.with_columns(
pl.col("Name").str.len_bytes().alias("bytes"),
pl.col("Name").str.len_chars().alias("chars")
)
shape: (4, 4)
┌────────────┬───────┬───────┬───────┐
│ start_date ┆ Name ┆ bytes ┆ chars │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ u32 ┆ u32 │
╞════════════╪═══════╪═══════╪═══════╡
│ 2020-01-02 ┆ John ┆ 4 ┆ 4 │
│ 2020-01-03 ┆ Joe ┆ 3 ┆ 3 │
│ 2020-01-04 ┆ James ┆ 5 ┆ 5 │
│ 2020-01-05 ┆ Jörg ┆ 5 ┆ 4 │
└────────────┴───────┴───────┴───────┘