I'm encountering a ComputeError when trying to create a Polars DataFrame from a data. The error message is:
ComputeError: could not append value: 1.41431 of type: f64 to the builder; make sure that all rows have the same schema or consider increasing infer_schema_length it might also be that a value overflows the data-type's capacity
Here's the relevant part of my code using a reproducible json example (test.json):
import json
import pandas as pd
import polars as pl
response = requests.get('https://github.com/user-attachments/files/16026717/test.json')
res = json.loads(response.text)
df_pd = pd.DataFrame(res) # Creating DataFrame using Pandas (works fine)
df = pl.DataFrame(res) # Creating DataFrame using Polars (raises the error)
I expected the pl.DataFrame(res) line to work similarly to the pd.DataFrame(res) line, but it raises the mentioned error. I've also noticed that a similar error was discussed in a closed GitHub issue for Polars, but I am still encountering this error despite using the latest version of Polars (0.20.31).
Has anyone faced a similar issue or have any insights on how to resolve this?
I checked the types of the values in res and they seem to be consistent. I also tried other things like rounding floating-point numbers to 5 decimal places or increasing infer_schema_length
, but they didn't work and I got the same error again.
The problem was solved by setting infer_schema_length
to None
, thanks to jqurious
pl.DataFrame(res, infer_schema_length=None)
It seems the error was raised because the default value of infer_schema_length
is 100. However, in our data, the type of that column can not be detected until a row that is bigger than 100.
I suggested in the GitHub issue to modify the error description so the users can debug their code easily by setting the parameter to None
.