Search code examples
pythonpython-polars

Select all columns where column name starts with string


Given the following dataframe, is there some way to select only columns starting with a given prefix? I know I could do e.g. pl.col(column) for column in df.columns if column.startswith("prefix_"), but I'm wondering if I can do it as part of a single expression.

df = pl.DataFrame(
    {"prefix_a": [1, 2, 3], "prefix_b": [1, 2, 3], "some_column": [3, 2, 1]}
)
df.select(pl.all().<column_name_starts_with>("prefix_"))

Would this be possible to do lazily?


Solution

  • Starting from Polars 0.18.1 you can use Selectors(polars.selectors.starts_with) which provides more intuitive selection of columns from DataFrame or LazyFrame objects based on their name, dtype or other properties.

    >>> import polars as pl
    >>> import polars.selectors as cs
    >>> 
    >>> df = pl.DataFrame(
    ...     {"prefix_a": [1, 2, 3], "prefix_b": [1, 2, 3], "some_column": [3, 2, 1]} 
    ... )
    >>> df
    shape: (3, 3)
    ┌──────────┬──────────┬─────────────┐
    │ prefix_a ┆ prefix_b ┆ some_column │
    │ ---      ┆ ---      ┆ ---         │
    │ i64      ┆ i64      ┆ i64         │
    ╞══════════╪══════════╪═════════════╡
    │ 1        ┆ 1        ┆ 3           │
    │ 2        ┆ 2        ┆ 2           │
    │ 3        ┆ 3        ┆ 1           │
    └──────────┴──────────┴─────────────┘
    >>> # print(df.lazy().select(cs.starts_with("prefix_")).collect()) # for LazyFrame
    >>> print(df.select(cs.starts_with("prefix_"))) # For DataFrame
    shape: (3, 2)
    ┌──────────┬──────────┐
    │ prefix_a ┆ prefix_b │
    │ ---      ┆ ---      │
    │ i64      ┆ i64      │
    ╞══════════╪══════════╡
    │ 1        ┆ 1        │
    │ 2        ┆ 2        │
    │ 3        ┆ 3        │
    └──────────┴──────────┘