Search code examples
pythondirectory-structureos.walk

Scan a directory tree and reading .csv files into a dataframe using Python


I am trying to walk a directory tree and for each csv encountered on the walk I would like to open the file and read columns 0 and 15 into a data-frame (after which I'll process and move onto the next file. I can walk the directory tree using the following:

rootdir = r'C:/Users/stacey/Documents/Alco/auditopt/'
for dirName,sundirList, fileList in os.walk(rootdir):
         print('Found directory: %s' % dirName)
         for fname in fileList:
             print('\t%s' % fname)
             df = pd.read_csv(fname, header=1, usecols=[0,15],parse_dates=[0], dayfirst=True,index_col=[0], names=['date', 'total_pnl_per_pos'])
             print(df)

but I'm getting the error message:

FileNotFoundError: File b'auditopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv' does not exist.

I am trying to read files which do exist. They are in an MS Excel .csv format so I don't know if that is an issue - if it is, would someone let me know how I read an MS Excel .csv into a data-frame please.

The full stack trace is as follows:

Found directory: C:/Users/stacey/Documents/Alco/auditopt/
Found directory: C:/Users/stacey/Documents/Alco/auditopt/roll_597_oe_2017-03-10
        tradeopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv
Traceback (most recent call last):

  File "<ipython-input-24-3753e367432d>", line 1, in <module>
    runfile('C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py', wdir='C:/Users/stacey/Documents/scripts')

  File "C:\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "C:\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py", line 49, in <module>
    main()

  File "C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py", line 36, in main
    df = pd.read_csv(fname, header=1, usecols=[0,15],parse_dates=[0], dayfirst=True,index_col=[0], names=['date', 'total_pnl_per_pos'])

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
    self._make_engine(self.engine)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)

  File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)

  File "pandas\parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8449)

FileNotFoundError: File b'tradeopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv' does not exist

Solution

  • When reading in the file, you need to provide the full path. os.walk by default does not supply the full path. You'll need to supply it yourself.

    Use os.path.join to make this easy.

    import os
    full_path = os.path.join(dirName, file)
    df = pd.read_csv(full_path, ...)