I want to read the docx file in python. then extract numbers from that like:
with open('test.docx') as t:
text = t.readlines()
a = []
a.append([int(s) for s in text.split() if s.isdigit()])
a = [int(numeric_string) for numeric_string in a]
Thanks for any bits of help
You can use the docx library to read the content of .docx files.
pip install python-docx
Adapting some code from here and combining with the code you posted I got:
import docx
def getText(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
text = getText('Doc1.docx')
a = [int(s) for s in text.split() if s.isdigit()]
which worked for me with a simple test file - although you may need to adjust some parts depending on how you want the search for numbers to work.