Search code examples
pythonpython-polars

Polars Transform String Containing Key-Values


Trying to figure out how to transform a k-v string that is inside a column where the k-v string is separated by commas, and could contain different keys. The different keys would then be transformed into their own columns, where missing values would contain nulls.

For example,

pl.DataFrame({
    "apple": [1, 2, 3],
    "data": ["a=b, b=c", "a=y, y=z", "k1=v1, k2=v2"]
})

would look like:

pl.DataFrame({
    "apple": [1, 2, 3],
    "a": ["b", "y", None],
    "b": ["c", None, None],
    "y": [None, "z", None],
    "k1": [None, None, "v1"],
    "k2": [None, None, "v2"],
    "data": ["a=b, b=c", "a=y, y=z", "k1=v1, k2=v2"]
})

once transformed. Does anyone know what is the most efficient way to do this (perhaps without pre-processing of the data, if possible?)

enter image description here


Solution

  • You could attempt to reformat it as JSON objects.

    df.with_columns(pl.format('{"{}"}', 
        pl.col("data").str.replace_many({"=": '":"', ", ": '","'})
    ))
    
    shape: (3, 3)
    ┌───────┬──────────────┬───────────────────────┐
    │ apple ┆ data         ┆ literal               │
    │ ---   ┆ ---          ┆ ---                   │
    │ i64   ┆ str          ┆ str                   │
    ╞═══════╪══════════════╪═══════════════════════╡
    │ 1     ┆ a=b, b=c     ┆ {"a":"b","b":"c"}     │
    │ 2     ┆ a=y, y=z     ┆ {"a":"y","y":"z"}     │
    │ 3     ┆ k1=v1, k2=v2 ┆ {"k1":"v1","k2":"v2"} │
    └───────┴──────────────┴───────────────────────┘
    

    Which would allow you to .str.json_decode() into a struct.

    And then .unnest() into columns.

    df.with_columns(
       pl.format('{"{}"}', 
          pl.col("data").str.replace_many({"=": '":"', ", ": '","'})
       )
       .str.json_decode()
       .alias("json")
    ).unnest("json")
    
    shape: (3, 7)
    ┌───────┬──────────────┬──────┬──────┬──────┬──────┬──────┐
    │ apple ┆ data         ┆ a    ┆ b    ┆ y    ┆ k1   ┆ k2   │
    │ ---   ┆ ---          ┆ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
    │ i64   ┆ str          ┆ str  ┆ str  ┆ str  ┆ str  ┆ str  │
    ╞═══════╪══════════════╪══════╪══════╪══════╪══════╪══════╡
    │ 1     ┆ a=b, b=c     ┆ b    ┆ c    ┆ null ┆ null ┆ null │
    │ 2     ┆ a=y, y=z     ┆ y    ┆ null ┆ z    ┆ null ┆ null │
    │ 3     ┆ k1=v1, k2=v2 ┆ null ┆ null ┆ null ┆ v1   ┆ v2   │
    └───────┴──────────────┴──────┴──────┴──────┴──────┴──────┘