The whole error:
C:\Users\Desktop\texts>python similarity1.py
Traceback (most recent call last):
File "similarity1.py", line 19, in <module>
documents = [open(f, encoding="utf-8").read() for f in text_files]
File "similarity1.py", line 19, in <listcomp>
documents = [open(f, encoding="utf-8").read() for f in text_files]
FileNotFoundError: [Errno 2] No such file or directory: 'apempe_chunks.txt'
and the code producing the specific error:
import os
import codecs
import string, re
from pathlib import Path
path = "C:\\Users\\Desktop\\texts\\dataset"
text_files = os.listdir(path)
documents = [open(f, encoding="utf-8").read() for f in text_files]
sparse_matrix = tfidf_vectorizer.fit_transform(documents)
Strange thing is that the program finds apempe_chunks.txt
which is inside the file dataset
.
I've researched the question in SO, but I can't fix it.
To work around the error, I moved similarity1.py
within the dataset
folder, I added this to my code if f.endswith('.txt')]
and now it works fine.
So now the complete code is
documents = [open(f, encoding="utf-8").read() for f in text_files if f.endswith('.txt')]
ensuring I only work with every .txt
inside the dataset directory, not counting the python script it self or other files.
The idea came from this thread of answers, to a question similar to mine.