Search code examples
pythonpython-polarsndjson

Polars scan_ndjson() - TypeError: expected str, bytes or os.PathLike object, not int


I'm trying to read a gzip jsonl file using scan_ndjson(). However, I encountered the error

"TypeError: expected str, bytes or os.PathLike object, not int"

despite passing a string. read_ndjson() works fine but I have memory issues so using a LazyFrame would be helpful.

Here's what I'm trying to do:

import gzip

with gzip.open(file_path, 'rb') as file:
    df = pl.scan_ndjson(file.read(), ignore_errors=True))

Solution

  • Since scan_ndjson() only accepts filepaths, I created a temporary file.

    import tempfile
    import shutil
    
    temporary_file = tempfile.NamedTemporaryFile(mode='w+', delete=False)
    
    with gzip.open(file_path, 'rb') as file_in:
        shutil.copyfileobj(file_in, temporary_file)
        pl.scan_ndjson(
            temporary_file.name,
            ignore_errors=True,
            low_memory=True,
            rechunk=True
        )
    
    temporary_file.close()
    os.unlink(temporary_file.name)