Search code examples
pythonpandascsvcronoserror

pandas read_csv not working when running via cron on Mac OS


I have a python script that is being run by cron. The script imports the pandas module and uses read_csv to load a csv to a data frame and then later saves it to another csv. 'apath' is the absolute path to the file:

statedata_raw=pd.read_csv(apath+'statedata.csv')
statedata_raw.to_csv(apath+'state_data.csv',index=False)    

The permissions on the csv file are set correctly -rwxr-xr-x

when I run it in the command line, everything works fine. When I run it via cron I get the following error:

Traceback (most recent call last):
  File "/users/maderman/wdtest.py", line 21, in <module>
    statedata_raw=pd.read_csv(apath+'statedata.csv')
  File "/opt/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/opt/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/opt/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/opt/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/opt/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 678, in pandas._libs.parsers.TextReader._setup_parser_source
OSError: Initializing from file failed

I verified that pandas itself is loading and that the to_csv is working by replacing the read_csv. When I replaced the read_csv with the following code to manually create a dataframe, everything worked fine, running in command line and running in cron:

cat=['a','a','a','a','a','b','b','b','b','b']
val=[1,2,3,4,5,6,7,8,9,10]
columns=['cat','val']
data=[cat,val]
dict={key:value for key,value in zip(columns,data)}
statedata_raw=pd.DataFrame(data=dict)

I found another post that suggested passing the argument engine='python' to the read_csv, but that didn't do anything.

So I know that:

  1. cron is running python fine
  2. it can import pandas and run a couple of different pandas functions.
  3. the file permissions are fine

The issue seems to be specifically related to the read_csv commmand.

Any suggestions would be appreciated.


Solution

  • The framing on this question was wrong and it boiled down to a permissions issue. A better question was posted and answered here: stackoverflow.com/questions/62353610