Search code examples
pythonpdftextdata-analysispdfminer

Python Script to run a command over all files in a folder


For converting pdf to text I am using the following command:

pdf2txt.py -o text.txt example.pdf # It will convert example.pdf to text.txt

But I have more than 1000 pdf files which I need to convert to text file first and then do the analysis.

Is there a way through which I can use this command to iterate over the pdf files and convert all of them?


Solution

  • I would suggest you to have a shell script:

    for f (*.pdf) {pdf2txt.py -o $f $f.txt}
    

    Then read all .txt files using python for your analysis.

    Using only python to convert:

    from subprocess import call
    import glob
    
    for pdf_file in glob.glob('*.pdf'): 
        call(["pdf2txt.py", "-o", pdf_file, pdf_file[:-3]+"txt"])