I have this simple code
import pandas as pd
file = pd.read_parquet('file.rot',engine='fastparquet')
file.rot is a table of data (float numbers) with 5 columns
When I run it the error that appears is this
File ~\miniconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\users\josé\onedrive\ambiente de trabalho\draft.py:10
file = pd.read_parquet('file.rot',sep='=',engine='fastparquet')
File ~\miniconda3\Lib\site-packages\pandas\io\parquet.py:667 in read_parquet
return impl.read(
File ~\miniconda3\Lib\site-packages\pandas\io\parquet.py:402 in read
parquet_file = self.api.ParquetFile(path, **parquet_kwargs)
File ~\miniconda3\Lib\site-packages\fastparquet\api.py:135 in __init__
self._parse_header(fn, verify)
File ~\miniconda3\Lib\site-packages\fastparquet\api.py:215 in _parse_header
f.seek(-(head_size + 8), 2)
OSError: [Errno 22] Invalid argument
I don't know what I'm doing wrong, or if i did something wrong installing fastparquet on miniconda
For those interested, here is what actually happens when fastparquet tries to read a file as parquet. According to the parquet spec, the last four bytes of the file should be b"PAR1", and the four bytes before that gives you the size of the footer in bytes. You could pass verify=True
to check for the magic bytes:
>>> fastparquet.ParquetFile('file.rot', engine='fastparquet', verify=True)
ParquetException: File parse failed
This is not the default and not done by pandas. So, fastparquet has assumed the size given in the four bytes preceding, probably some random big number, and seek()
on the file therefore fails, since the location inferred is outside the file.