My implementation so far:
from docx.api import Document
import pandas as pd
from docx.shared import Pt
texts = []
sizes = []
document = Document('new.docx')
for p in document.paragraphs:
for run in p.runs:
if p.style.name.startswith("Normal") and run.font.size != Pt(11):
texts.append(run.text)
print(texts)
This seems to give the output but some outputs are incorrect. By incorrect I mean I am also getting output which is Normal style and font size is 11. Is this the correct implementation or is there any other way to achieve this? TIA!
What I learned is that styles are stored in another part of the .docx files by default. A style setting can be extracted in one condition. If that setting differs from the default style settings (e.g., Normal, No Spacing, Heading 1, Title, etc.) applied to the paragraph. In this case, Word stores it with the text.
Another StackOverflow question thread for a better understanding: link
E.g., If your Word's default font size for the "Heading 1" is 20pt, and your text is 20pt, you won't be able to extract it. But if it is something else, it will return by your code.