Search code examples
dataframepython-polars

ComputeError: could not append value when creating Polars DataFrame from data


I'm encountering a ComputeError when trying to create a Polars DataFrame from a data. The error message is: ComputeError: could not append value: 1.41431 of type: f64 to the builder; make sure that all rows have the same schema or consider increasing infer_schema_length it might also be that a value overflows the data-type's capacity Here's the relevant part of my code using a reproducible json example (test.json):

import json
import pandas as pd
import polars as pl
response = requests.get('https://github.com/user-attachments/files/16026717/test.json')
res = json.loads(response.text)

df_pd = pd.DataFrame(res) # Creating DataFrame using Pandas (works fine)
df = pl.DataFrame(res) # Creating DataFrame using Polars (raises the error)

I expected the pl.DataFrame(res) line to work similarly to the pd.DataFrame(res) line, but it raises the mentioned error. I've also noticed that a similar error was discussed in a closed GitHub issue for Polars, but I am still encountering this error despite using the latest version of Polars (0.20.31).

Has anyone faced a similar issue or have any insights on how to resolve this?

I checked the types of the values in res and they seem to be consistent. I also tried other things like rounding floating-point numbers to 5 decimal places or increasing infer_schema_length, but they didn't work and I got the same error again.


Solution

  • The problem was solved by setting infer_schema_length to None, thanks to jqurious

    pl.DataFrame(res, infer_schema_length=None)
    

    It seems the error was raised because the default value of infer_schema_length is 100. However, in our data, the type of that column can not be detected until a row that is bigger than 100. I suggested in the GitHub issue to modify the error description so the users can debug their code easily by setting the parameter to None.