Search code examples
pythonpython-polars

How to split a list column and add them as new column values in polars dataframe?


I have a data frame as below.

pl.DataFrame({'combine_address':[ ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
                                 ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
                                ]})

Here combine address is a list type column which has elements with about 6 pipe(|) values, Here i would like to apply a split on each element with an separator(|) in a list.

Here is the expected output:

enter image description here

If a list has 3 elements the splitted columns will be 3*6=18

If a list has 5 elements the splitted columns will be 5*6=30 and so on so forth.


Solution

  • Is this what you are looking for?

    df = pl.DataFrame({"combine_address":[
        ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
        ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
    ]})
    
    df.select(
        pl.col("combine_address").reshape((1, -1))
          .arr.join("|")
          .str.split("|")
          .list.to_struct(n_field_strategy="max_width")
    ).unnest("combine_address")
    
    shape: (1, 36)
    ┌─────────┬───────────┬─────────┬─────────┬─────┬──────────┬──────────┬──────────┬──────────┐
    │ field_0 ┆ field_1   ┆ field_2 ┆ field_3 ┆ ... ┆ field_32 ┆ field_33 ┆ field_34 ┆ field_35 │
    │ ---     ┆ ---       ┆ ---     ┆ ---     ┆     ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
    │ str     ┆ str       ┆ str     ┆ str     ┆     ┆ str      ┆ str      ┆ str      ┆ str      │
    ╞═════════╪═══════════╪═════════╪═════════╪═════╪══════════╪══════════╪══════════╪══════════╡
    │ Yes     ┆ #456 Lane ┆ Apt#4   ┆ ABC     ┆ ... ┆ APT#94   ┆ SWE      ┆ WA       ┆ 43593    │
    └─────────┴───────────┴─────────┴─────────┴─────┴──────────┴──────────┴──────────┴──────────┘