Search code examples
pythonlinuxterminalpymupdf

Is there an efficient way to executing a program with similar names using python in the terminal?


I'm trying to process PDFs using PyMuPDF and I'm running this python file called process_pdf.py in the terminal.

> import sys, fitz
> fname = sys.argv[1]  # get document filename
> doc = fitz.open(fname)  # open document
> out = open(fname + ".txt", "wb")  # open text output
> for page in doc:  # iterate the document pages
> text = page.get_text().encode("utf8")  # get plain text (is in UTF-8)
> out.write(text)  # write text of page
> out.close()

Then I would feed in a pdf in the terminal such as python process_pdf.py 1.pdf. This would then produce 1.txt (text version of 1.pdf). A question I have is that can I make a simple program in the terminal that can iterate python process_pdf.py document_name.pdf multiple times like how a for-loop works? This is because the file names are sequential numbers.

I thought about making a for-loop such as

> for i in range(1,101): 
>     python process_pdf.py i.pdf

But that isn't how python works. P.S. Sorry if this doesn't make any sense; I'm very new into coding :(


Solution

  • Well, yes. you can execute any process with python, including python.exe (or /usr/bin/python3 if on linux) and give it any parameters you want.

    subprocess.popen, os.system, etc.

    There are some better ways mentioned here for specifically running python scripts from python. (runpy)

    but... this feels like an xy problem.

    how about simply generating the file names in the code?

    import sys, fitz
    
    for i in range(1,101): 
       fname = f"{i}.pdf"  # get document filename
       doc = fitz.open(fname)  # open document
       out = open(fname + ".txt", "wb")  # open text output
       for page in doc:  # iterate the document pages
           text = page.get_text().encode("utf8")  # get plain text (is in UTF-8)
           out.write(text)  # write text of page
       out.close()
        
    

    also, im unfamiliar with "fitz" but maybe you need to close the "doc" file. check out the "with" statement.


    for the sake of completeness, Here's how you could do it in your current state, without python, but with a for loop in bash