I am supposed to preprocess some PDFs in a folder. I am supposed to remove punctuation, make everything lower case and remove stopwords, and add some extra data from another CSV to it (as metadata). But I cannot even open them. All the googling does not help, since I do not understand the error message (none of the examples from other people helped, since they had different data types).
This is my code so far:
import PyPDF2
import re
for k in range(1,312):
# open the pdf file
object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve" % (k))
and this is what happens
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [37], in <cell line: 4>()
2 import re
4 for k in range(1,312):
5 # open the pdf file
----> 6 object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve" % (k))
TypeError: not all arguments converted during string formatting
You have forgotten to add the string formatting parameter:
object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve%s" % k)
Note the "%s"
at the end of the file path string. When formatting with the %
operator, the "%s"
is replaced by the formatting argument you pass, which in this case it's str(k)
.