Search code examples
jsonpython-polars

transform json to polars dataframe


i have the following json file and i would like to transform to a dataframe Polars. How can I use the pl.read_json function that have schema attribute?

    {
        "data": {
            "names": [
                "A",
                "B",
                "C",
                "D",
                "E"
            ],
            "ndarray": [
                [
                    "abc",
                    true,
                    0.374618,
                    1,
                    0.83252
                ],
                [
                    "hello",
                    false,
                    0.1265374619,
                    0,
                    0.253
                ]
            ]
        }
    }

enter image description here


Solution

  • I'm not sure if you can use pl.read_json with a file structured in that way.

    The issue is ndarray contains "mixed types" which is not allowed in Polars.

    [
        "abc",      # str
        true,       # bool
        0.374618,   # float
        1,          # int
        0.83252     # float
    ]
    

    Polars must choose a single type, e.g. in this case str is chosen as the "supertype":

    pl.select(pl.lit("""["abc", true, 1.23]""").str.json_decode())
    
    shape: (1, 1)
    ┌─────────────────────────┐
    │ literal                 │
    │ ---                     │
    │ list[str]               │
    ╞═════════════════════════╡
    │ ["abc", "true", "1.23"] │
    └─────────────────────────┘
    

    And there's no way to access the "original" type information.

    If you load the JSON first, outside of Polars (e.g. using the json module) you can use pl.DataFrame() directly.

    import json
    
    with open("data.json") as f:
        data = json.load(f)["data"]
        df = pl.DataFrame(data["ndarray"], schema=data["names"])
    
    shape: (2, 5)
    ┌───────┬───────┬──────────┬─────┬─────────┐
    │ A     ┆ B     ┆ C        ┆ D   ┆ E       │
    │ ---   ┆ ---   ┆ ---      ┆ --- ┆ ---     │
    │ str   ┆ bool  ┆ f64      ┆ i64 ┆ f64     │
    ╞═══════╪═══════╪══════════╪═════╪═════════╡
    │ abc   ┆ true  ┆ 0.374618 ┆ 1   ┆ 0.83252 │
    │ hello ┆ false ┆ 0.126537 ┆ 0   ┆ 0.253   │
    └───────┴───────┴──────────┴─────┴─────────┘