Search code examples
pythonpython-3.xpandasvalueerror

How do I read all the files in a folder after storing them inside a directory so that I can perform some action on them?


I am able to open all the files in a folder using os.listdir, but after that how do I read every file and then execute some code on them?

encode  = 'ANSI'
file_paths = os.listdir('C:\\Users\\TMPFQB\\Documents\\MY PROJECTS\\Jan 2024')
for file_path in range(len(file_paths)):
    with open(file_path, 'r', encoding = encode) as f:
        Jan_txt = f.readlines()[4:]

With this code, I get the following error message:

ValueError                                Traceback (most recent call last)
Cell In[14], line 6
      4 file_paths = os.listdir('C:\\Users\\TMPFQB\\Documents\\MY PROJECTS\\Jan 2024')
      5 for file_path in range(len(file_paths)):
----> 6     with open(file_path, 'r', encoding = encode) as f:
      7         Jan_txt = f.readlines()[4:]

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:280, in _modified_open(file, *args, **kwargs)
    277 @functools.wraps(io_open)
    278 def _modified_open(file, *args, **kwargs):
    279     if file in {0, 1, 2}:
--> 280         raise ValueError(
    281             f"IPython won't let you open fd={file} by default "
    282             "as it is likely to crash IPython. If you know what you are doing, "
    283             "you can use builtins' open."
    284         )
    286     return io_open(file, *args, **kwargs)

ValueError: IPython won't let you open fd=0 by default as it is likely to crash IPython. If you know what you are doing, you can use builtins' open.

Below shows how I intend to combine the whole text into one string, then use reGex to pull out what I need: (but I am stuck on trying to be able to execute this on every file in the folder)

whole_txt = "".join(Jan_16_txt)
matches = re.findall(pattern, whole_txt, flags = re.MULTILINE)
Jan_16 = [[m[8], m[12], m[27], m[29]] for m in matches]
df = pd.DataFrame(data = Jan_16)

Solution

  • As already suggested and explained by @JKupzig in the comment, the issue is with your iteration, not referencing the path of the according files to open.

    base_path = r'C:\Users\TMPFQB\Documents\MY PROJECTS\Jan 2024'
    
    # List of files in above directory, making sure no directories are included
    file_names = [file_name for file_name in os.listdir(base_path) if os.path.isfile(os.path.join(base_path, file_name))]
    
    # You can also do this in above list comprehension directly, but to make it easier to understand...
    # Create absolute paths to these files
    file_paths = [os.path.join(base_path, file_name) for file_name in file_names]
    
    for file_path in file_paths:
        with open(file_path, 'r', encoding = encode) as f:
            .... # your logic