Search code examples
javadocx4j

How to read all the font sizes used in a docx


I am using docx4j and the very useful webapp they've built for parts list: http://webapp.docx4java.org/OnlineDemo/PartsList.html

I have a sample document with five words. First four are in font size 12 and the last is in font size 8.

I would like to read all the different font sizes used in the document. So in this case: 12 and 8

I uploaded the sample document on the webapp and I think this information would be stored in document.xml but I'm not certain as I only see 16 but not 24 in the xml. Also, I'm not certain how to extract this information.

Questions

  • How can I extract font size of the word content in docx4j?

  • How can I extract the font color of each word and background color of the entire word document?


Solution

  • If the font size is not set on the run, and a style is in use, you need to check the style hierarchy. If it is not set there, it comes back to defaults.

    As ECMA 4ed Part 1 puts it in 17.7.2 (Style Hierarchy):

    This process can be described as follows:

    • First, the document defaults are applied to all runs and paragraphs in the document.
    • Next, the table style properties are applied to each table in the document, following the conditional formatting inclusions and exclusions specified per table.
    • Next, numbered item and paragraph properties are applied to each paragraph formatted with a numbering style.
    • Next, paragraph and run properties are applied to each paragraph as defined by the paragraph style.
    • Next, run properties are applied to each run with a specific character style applied.
    • Finally, we apply direct formatting (paragraph or run properties not from styles). If this direct formatting includes numbering, that numbering + the associated paragraph properties are applied.

    If the value of the rFonts element (§17.3.2.26) references a font which is not available, applications determine a suitable alternative font via a process called font substitution, which is defined in §17.8.2.

    docx4j does something like this - see for example line 430 and ff in https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/model/PropertyResolver.java

    Similar principles apply to font color.

    I don't address here how to iterate through the document word by word (or rather, run by run), other than to say google traversalutil

    Example of setting font size explicitly in a run

                    <w:r>
                        <w:rPr>
                            <w:sz w:val="36"/>
                        </w:rPr>
                        <w:t>this is 18 points</w:t>
                    </w:r>
    

    You can set that in Microsoft Word, or using docx4j. To see how to do it in docx4j, you can use to the webapp to generate code from a sample docx, but the essence is:

        org.docx4j.wml.R  yourRun;
        yourRun.getRPr().setSz(an HpsMeasure);