In Python, how can I get part of docx document?

I would like to get part of docx document ( for example, 10% of all content) with Python 3. How I can do this? Thanks.

Solution

A good way to interact with .docx files in python is the docx2txt module.

If you have pip installed you can open your terminal and run:

pip install docx2txt

Once you have the docx module you can run:

import docx2txt

You can then return the text in the document and filter only the parts you want. The contents of filename.docx is stored as a string in the variable text.

text = docx2txt.process("filename.docx")
print(text)

It is now possible to manipulate that string using some basic built-functions. The code snippet below prints the results of text, returns the length using the len() function, and slices the string to about 10% by creating a substring.

len(text)
print(len(text))  # returns 1000 for my sample document

text = text[1:100]
print(text)  # returns 10% of the string

My full code for this example is below. I hope this is helpful!

import docx2txt

text = docx2txt.process("/home/jared/test.docx")
print(text)

len(text)
print(len(text))  # returns 1000 for my sample document

text = text[1:100]
print(text)  # returns 10% of the string