Search code examples
javascriptnode.jsnodejs-polars

Convert javascript object to polars dataframe in node.js and populate missing values with null


I'm trying to use nodejs-polars library but have encountered a problem converting a javascript object to a polars dataframe.

Consider the following data for example

const myData = [
    {
        "id": "a",
        "name": "fred",
        "country": "france",
        "age": 30,
        "city": "paris" // there's no "city" property elsewhere in `myData`
    },
    {
        "id": "b",
        "name": "alexandra",
        "country": "usa",
        "age": 40
    },
    {
        "id": "c",
        "name": "george",
        "country": "argentina",
        "age": 50
    }
]

So if we do

const pl = require("nodejs-polars")

const output = pl.DataFrame(myData)

We get the error:

Error: Lengths don't match: Could not create a new DataFrame from Series. The Series have different lengths

Is there no way to create a polars dataframe from object such that it will automatically populate missing values with null?


Solution

  • You are looking for readRecords:

    const df = pl.readRecords(myData, {inferSchemaLength: 10})
    

    Nested data types have limited support.


    Old answer:

    This can be achieved with pl.readJSON if you don't mind doing a small amount of pre-processing.

    const ndJSONData = myData
      .map(row => JSON.stringify(row))
      .join("\n")
    
    // -1 will do a full scan, set to a length that best fits your use case.
    const df = pl.readJSON(ndJSONData, {"inferSchemaLength": -1})