I'm trying to dump a snowflake table to parquet
rereading the parquet file from within snowflake works, but when reading it using other tools (pandas, pyarrow,...) I get an error about the format
code to reproduce:
from snowflake.snowpark import Session
from snowflake.snowpark import functions as F
import os
from snowflake.ml.fileset import sfcfs
import pandas
connection_parameters = {} # this is setup specific
snowpark_session = Session.builder.configs(connection_parameters).create()
df = snowpark_session.createDataFrame(pandas.DataFrame({'a': [1,2,3]}))
full_name = f'{snowpark_session.get_session_stage()}/report1'
df.write.parquet(full_name, header=True, overwrite=True)
# this works
snowpark_session.read.parquet(full_name)
# this fails
fs = sfcfs.SFFileSystem(snowpark_session=snowpark_session)
file_name = fs.ls(snowpark_session.get_session_stage())[0]
pandas.read_parquet(fs.open(file_name))
the error message I get is:
ArrowInvalid: Could not open Parquet input source '': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
the issue appears to be with the snowflake file system object - there's an alternative api that does work:
pandas.read_parquet(snowpark_session.file.get_stream(file_name))