Search code examples
pythonflaskform-datapypdfpython-os

Read or save a PDF file uploaded to Flask


I'm uploading multiple files to flask using a form, I'm getting the file objects in the flask backend without a problem but the issue is I want to read the PDF files to extract text from them. I can't do it on the file objects I received from the form, another method I thought of was saving the file in the local storage then read them again when I did that using file.save(path, filename) it created an empty text file with the name - filename.pdf

app=Flask(__name__)


@app.route('/')
def index():
    return '''
        <form method='POST' action='/saveData'>
        <input type='file' name='testReport'>
        <input type='submit'>
        </form>
    '''

@app.route('/saveData', methods=['POST'])
def saveData():
    if 'testReport' in request.files:
        testReport= request.files['testReport']
        #This isn't working, a text file is saved with the same name ,ending in pdf
        testReport.save(os.path.join(app.config['UPLOAD_FOLDER'], testReport.filename))       
        return f'<h1>File saved {testReport.filename}</h1>'
        
    else:
        return 'Not done'

How do we operate on PDF files after uploading them to flask ?


Solution

  • How do we operate on PDF files after uploading them to flask ?

    You should treat them just like normal PDF files - if they were uploaded via Flask application or gathered using other method is irrelevant here. As you

    want to read the PDF files to extract text from them.

    you should use PDF text-extraction tool, for example pdfminer.six, as this is external module you need to install it first: pip install pdfminer.six