I have a data frame as below.
pl.DataFrame({'combine_address':[ ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
]})
Here combine address is a list type column which has elements with about 6 pipe(|) values, Here i would like to apply a split on each element with an separator(|) in a list.
Here is the expected output:
If a list has 3 elements the splitted columns will be 3*6=18
If a list has 5 elements the splitted columns will be 5*6=30 and so on so forth.
Is this what you are looking for?
df = pl.DataFrame({"combine_address":[
["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
]})
df.select(
pl.col("combine_address").reshape((1, -1))
.arr.join("|")
.str.split("|")
.list.to_struct(n_field_strategy="max_width")
).unnest("combine_address")
shape: (1, 36)
┌─────────┬───────────┬─────────┬─────────┬─────┬──────────┬──────────┬──────────┬──────────┐
│ field_0 ┆ field_1 ┆ field_2 ┆ field_3 ┆ ... ┆ field_32 ┆ field_33 ┆ field_34 ┆ field_35 │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │
╞═════════╪═══════════╪═════════╪═════════╪═════╪══════════╪══════════╪══════════╪══════════╡
│ Yes ┆ #456 Lane ┆ Apt#4 ┆ ABC ┆ ... ┆ APT#94 ┆ SWE ┆ WA ┆ 43593 │
└─────────┴───────────┴─────────┴─────────┴─────┴──────────┴──────────┴──────────┴──────────┘