As you might have recognized from my other questions I am transitioning from pandas to polars right now. I have a polars df with differently nested lists like this:
┌────────────────────────────────────┬────────────────────────────────────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[list[str]] ┆ list[list[str]] ┆ list[str] ┆ str │
╞════════════════════════════════════╪════════════════════════════════════╪═════════════════╪══════╡
│ [["a", "a"], ["b", "b"], ["c", "c"]┆ [["a", "a"], ["b", "b"], ["c", "c"]┆ ["A", "B", "C"] ┆ 1 │
│ [["a", "a"]] ┆ [["a", "a"]] ┆ ["A"] ┆ 2 │
│ [["b", "b"], ["c", "c"]] ┆ [["b", "b"], ["c", "c"]] ┆ ["B", "C"] ┆ 3 │
└────────────────────────────────────┴────────────────────────────────────┴─────────────────┴──────┘
Now I want to join the lists inside out using different separators to reach this:
┌─────────────┬─────────────┬───────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═════════════╪═════════════╪═══════╪══════╡
│ a+a-b+b-c+c ┆ a+a-b+b-c+c ┆ A-B-C ┆ 1 │
│ a+a ┆ a+a ┆ A ┆ 2 │
│ b+b-c+c ┆ b+b-c+c ┆ B-C ┆ 3 │
└─────────────┴─────────────┴───────┴──────┘
I do this by using map_elements
and a for loop, but I guess that is highly inefficient. Is there a polars native way to manage this?
Here is my code:
import polars as pl
df = pl.DataFrame({"col1": [[["a", "a"], ["b", "b"], ["c", "c"]], [["a", "a"]], [["b", "b"], ["c", "c"]]],
"col2": [[["a", "a"], ["b", "b"], ["c", "c"]], [["a", "a"]], [["b", "b"], ["c", "c"]]],
"col3": [["A", "B", "C"], ["A"], ["B", "C"]],
"col4": ["1", "2", "3"]})
nested_list_cols = ["col1", "col2"]
list_cols = ["col3"]
for col in nested_list_cols:
df = df.with_columns(pl.lit(df[col].map_elements(lambda listed: ['+'.join(element) for element in listed], return_dtype=pl.List(pl.String))).alias(col)) # is the return_dtype always pl.List(pl.String)?
for col in list_cols + nested_list_cols:
df = df.with_columns(pl.lit(df[col].list.join(separator='-')).alias(col))
You could use list.eval()
and .list.join()
df.with_columns(
pl.col(nested_list_cols).list.eval(pl.element().list.join("+")).list.join("-"),
pl.col(list_cols).list.join("-")
)
shape: (3, 4)
┌─────────────┬─────────────┬───────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═════════════╪═════════════╪═══════╪══════╡
│ a+a-b+b-c+c ┆ a+a-b+b-c+c ┆ A-B-C ┆ 1 │
│ a+a ┆ a+a ┆ A ┆ 2 │
│ b+b-c+c ┆ b+b-c+c ┆ B-C ┆ 3 │
└─────────────┴─────────────┴───────┴──────┘