Search code examples
javams-wordapache-poidocxmathml

Reading equations & formula from Word (Docx)


We have a word/docx file which has equations. Using POI's XWPFWordExtractor.getText doesn't read the equations.

My questions are:

  1. What/how are these equations represented as?
  2. How do I read them (I want to eventually display them on an HTML - as MathML??)?

Thanks!


Solution

  • An equation in a docx file is representation using omml m:oMathPara/m:oMath:

      <m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
        <m:oMath>
    

    I don't know about POI, but in docx4j, elements in that namespace are represented using JAXB generated objects in org.docx4j.math

    I'd tackle your second question by marshalling the m:oMathPara/m:oMath, then transforming via omml2mathml.xsl See further Murray Sargent's blog (for example here and here).