Search code examples
pythonpython-docx

python: find numbers in docx file and replace


I want to read the docx file in python. then extract numbers from that like:

with open('test.docx') as t:
    text = t.readlines()
a = []
a.append([int(s) for s in text.split() if s.isdigit()])
a = [int(numeric_string) for numeric_string in a]

Thanks for any bits of help


Solution

  • You can use the docx library to read the content of .docx files.

    pip install python-docx
    

    Adapting some code from here and combining with the code you posted I got:

    import docx
    
    def getText(filename):
        doc = docx.Document(filename)
        fullText = []
        for para in doc.paragraphs:
            fullText.append(para.text)
        return '\n'.join(fullText)
    
    text = getText('Doc1.docx')
    
    a = [int(s) for s in text.split() if s.isdigit()]
    

    which worked for me with a simple test file - although you may need to adjust some parts depending on how you want the search for numbers to work.