I am using docx4j and the very useful webapp they've built for parts list: http://webapp.docx4java.org/OnlineDemo/PartsList.html
I have a sample document with five words. First four are in font size 12 and the last is in font size 8.
I would like to read all the different font sizes used in the document. So in this case: 12 and 8
I uploaded the sample document on the webapp and I think this information would be stored in document.xml
but I'm not certain as I only see 16
but not 24
in the xml. Also, I'm not certain how to extract this information.
Questions
How can I extract font size of the word content in docx4j?
How can I extract the font color of each word and background color of the entire word document?
If the font size is not set on the run, and a style is in use, you need to check the style hierarchy. If it is not set there, it comes back to defaults.
As ECMA 4ed Part 1 puts it in 17.7.2 (Style Hierarchy):
This process can be described as follows:
- First, the document defaults are applied to all runs and paragraphs in the document.
- Next, the table style properties are applied to each table in the document, following the conditional formatting inclusions and exclusions specified per table.
- Next, numbered item and paragraph properties are applied to each paragraph formatted with a numbering style.
- Next, paragraph and run properties are applied to each paragraph as defined by the paragraph style.
- Next, run properties are applied to each run with a specific character style applied.
- Finally, we apply direct formatting (paragraph or run properties not from styles). If this direct formatting includes numbering, that numbering + the associated paragraph properties are applied.
If the value of the rFonts element (§17.3.2.26) references a font which is not available, applications determine a suitable alternative font via a process called font substitution, which is defined in §17.8.2.
docx4j does something like this - see for example line 430 and ff in https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/model/PropertyResolver.java
Similar principles apply to font color.
I don't address here how to iterate through the document word by word (or rather, run by run), other than to say google traversalutil
Example of setting font size explicitly in a run
<w:r>
<w:rPr>
<w:sz w:val="36"/>
</w:rPr>
<w:t>this is 18 points</w:t>
</w:r>
You can set that in Microsoft Word, or using docx4j. To see how to do it in docx4j, you can use to the webapp to generate code from a sample docx, but the essence is:
org.docx4j.wml.R yourRun;
yourRun.getRPr().setSz(an HpsMeasure);