I would like to get part of docx document ( for example, 10% of all content) with Python 3. How I can do this? Thanks.
If you have pip installed you can open your terminal and run:
pip install docx2txt
Once you have the docx module you can run:
import docx2txt
You can then return the text in the document and filter only the parts you want. The contents of filename.docx is stored as a string in the variable text.
text = docx2txt.process("filename.docx")
print(text)
It is now possible to manipulate that string using some basic built-functions. The code snippet below prints the results of text, returns the length using the len() function, and slices the string to about 10% by creating a substring.
len(text)
print(len(text)) # returns 1000 for my sample document
text = text[1:100]
print(text) # returns 10% of the string
My full code for this example is below. I hope this is helpful!
import docx2txt
text = docx2txt.process("/home/jared/test.docx")
print(text)
len(text)
print(len(text)) # returns 1000 for my sample document
text = text[1:100]
print(text) # returns 10% of the string