I am writing a large number of financial time series datas to individual CSV files. In one instance I have found that the to_csv method repeatably fails but I cannot for the life of me figure out why. During the call to method to_csv everything just hangs for upwards of 10-15 minutes. Before crashing with the the error:
Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 172, in save self._save() File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 274, in _save self._save_header() File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 242, in _save_header writer.writerow(encoded_labels) OSError: [Errno 22] Invalid argument
During handling of the above exception, another exception occurred:
OSError: [Errno 22] Invalid argument
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "securitiesArchives.py", line 1072, in out_df.to_csv("PRN.csv",mode='w',encoding='UTF-8' ,compression=None) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv formatter.save() File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 187, in save f.close() OSError: [Errno 22] Invalid argument
It seems to be hanging up while writing the header row of the csv file. I wrote the same frame to hdf, then loaded from hdf, and using the hdf loaded frame, reproduced the same (or very close to the same) failure:
Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 172, in save self._save() File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 274, in _save self._save_header() File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 242, in _save_header writer.writerow(encoded_labels) PermissionError: [Errno 13] Permission denied
During handling of the above exception, another exception occurred:
PermissionError: [Errno 13] Permission denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "bad_archive.py", line 12, in #out_df.to_csv("PRN.csv",mode='w',encoding='UTF-8' ,compression=None) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv formatter.save() File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 187, in save f.close() PermissionError: [Errno 13] Permission denied
Not sure why it changed from "OSError: [Errno 22] Invalid argument" to "PermissionError: [Errno 13] Permission denied" when moving from a larger code body to a small sample problem. I have searched for these errors in relation to the method to_csv and have seen that previous version of pandas may have had similar issues but this was supposed to be resolved in a later version. My pandas is:
INSTALLED VERSIONS ------------------ commit: None python: 3.7.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.24.2 pytest: 5.0.1 pip: 19.1.1 setuptools: 41.0.1 Cython: 0.29.12 numpy: 1.16.4 scipy: 1.2.1 pyarrow: None xarray: None IPython: 7.6.1 sphinx: 2.1.2 patsy: 0.5.1 dateutil: 2.8.0 pytz: 2019.1 blosc: None bottleneck: 1.2.1 tables: 3.5.2 numexpr: 2.6.9 feather: None matplotlib: 3.1.0 openpyxl: 2.6.2 xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.1.8 lxml.etree: 4.3.4 bs4: 4.7.1 html5lib: 1.0.1 sqlalchemy: 1.3.5 pymysql: None psycopg2: None jinja2: 2.10.1 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.8.1 gcsfs: None
I am on a win-10 64 bit machine using Anaconda Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
I have tried:
out_df.loc[out_df.index.values[0]].to_csv("PRN.csv",mode='w',encoding='UTF-8' ,compression=None)
which also failed. Even though this is now a series and no longer a frame as the following warning was produced
FutureWarning: The signature of
Series.to_csv
was aligned to that ofDataFrame.to_csv
, and argument 'header' will change its default value from False to True: please pass an explicit value to suppress this warning.
The entire two row DataFrame which refuses cooperation with to_csv
out_df.loc[out_df.index.values[0]:out_df.index.values[1]].to_csv("PRN.csv",mode='w',encoding='UTF-8' ,compression=None,index=False,header=False)
But this also failed as before. I was however, able to write the sequence of each column independently to its own CSV file without issue.
for col_name in out_df.columns:
print('Writing '+col_name+' as CSV')
out_df[col_name].to_csv(col_name.replace(' ','_')+"_PRN.csv",mode='w',encoding='UTF-8' ,compression=None)
print('Done.')
Combined, the above success, and failure of the two row write attempt, makes me think this is not an issue related to specific columns values. Further, the tracebacks make me think this issues is related to writing the column headers. But the thing is I have 3000+ other DataFrames withe the exact same column labels and they write to csv using to_csv without issue. At this point I am out of my depth.
Failure occurs on this same set of data repeatedly whether I am using the data I wrote to hdf or using a fresh pull from yahoo using yfinance. The following code reliably recreates the issue on my system:
import pandas as pd
import yfinance as yf
good_df = yf.download(tickers='AAPL',interval='1m',period='7d')
bad_df = yf.download(tickers='PRN',interval='1m',period='7d')
print('Writing test case AAPL as CSV')
good_df.to_csv("AAPL.csv",mode='w',encoding='UTF-8' ,compression=None)
print('Writing test case PRN as CSV')
bad_df.to_csv("PRN.csv",mode='w',encoding='UTF-8' ,compression=None)
Anyone have any ideas?
PS - While re-reading I decided to check the column labels for equivalence and as far as a Boolean comparison is concerned those of the 'good' DataFrame and those of the 'bad' DataFrame as identical.
>>>print(good_df.columns)
Index(['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')
>>>print(bad_df.columns)
Index(['Open', 'High', 'Low', 'Close', 'Adj Close','Volume'], dtype='object')
>>>print(good_df.columns == bad_df.columns)
[ True True True True True True]
PPS - I also have tried removing all flags from to_csv though they should have been the default values. It was a carry-over from as used in other code and I was going through different values to see if it would work. The most basic to_csv call fails as before
import pandas as pd
import yfinance as yf
good_df = yf.download(tickers='AAPL',interval='1m',period='7d')
bad_df = yf.download(tickers='PRN',interval='1m',period='7d')
print('Writing test case AAPL as CSV')
good_df.to_csv("AAPL.csv")
print('Writing test case PRN as CSV')
bad_df.to_csv("PRN.csv")
I can see no file in explorer or via dir in console. But to test this I used a new file name that was not the symbol "PRN" and lo'n'behold it works.
I did not think this was the issue as I already tried writing to a different destination folder, both in the larger parent code, and then in the toy problem. Neither worked.
It would seem that windows has an old reference to any old file named "PRN.csv" or something....how frustrating. Lets hope a simple restart fixes it.
Thanks!
I literally had this same problem earlier today, but since I was working with much smaller data, the solution was easier to spot.
When a file is opened in another program, you cannot write or append to it. Check for places where you might have forgotten to close()
or if it's open for viewing in Microsoft Excel.
Also generally it's better to use open('file', 'a')
to write in case there's any previous data you stored there. If not, it will do the same as open('file','w')
and create a new file.