Given the following dataframe, is there some way to select only columns starting with a given prefix? I know I could do e.g. pl.col(column) for column in df.columns if column.startswith("prefix_")
, but I'm wondering if I can do it as part of a single expression.
df = pl.DataFrame(
{"prefix_a": [1, 2, 3], "prefix_b": [1, 2, 3], "some_column": [3, 2, 1]}
)
df.select(pl.all().<column_name_starts_with>("prefix_"))
Would this be possible to do lazily?
Starting from Polars 0.18.1 you can use Selectors
(polars.selectors.starts_with
) which provides more intuitive selection of columns from DataFrame
or LazyFrame
objects based on their name, dtype or other properties.
>>> import polars as pl
>>> import polars.selectors as cs
>>>
>>> df = pl.DataFrame(
... {"prefix_a": [1, 2, 3], "prefix_b": [1, 2, 3], "some_column": [3, 2, 1]}
... )
>>> df
shape: (3, 3)
┌──────────┬──────────┬─────────────┐
│ prefix_a ┆ prefix_b ┆ some_column │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════════╪══════════╪═════════════╡
│ 1 ┆ 1 ┆ 3 │
│ 2 ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 1 │
└──────────┴──────────┴─────────────┘
>>> # print(df.lazy().select(cs.starts_with("prefix_")).collect()) # for LazyFrame
>>> print(df.select(cs.starts_with("prefix_"))) # For DataFrame
shape: (3, 2)
┌──────────┬──────────┐
│ prefix_a ┆ prefix_b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 2 │
│ 3 ┆ 3 │
└──────────┴──────────┘