Search code examples
goms-worddocxdoc

Parse .doc & .docx for get all text using golang?


How can I parse word documents ".doc", ".docx" to get all the text using golang?


Solution

  • You can get some inspiration from those projects:

    https://github.com/nguyenthenguyen/docx
    https://github.com/opencontrol/doc-template

    Basically, DOCX is a Zip file with XMLs in it. All the texts are inside document.xml

    What both project do is remove all XML tags, leaving only text intact. You should see if that approach suits you too.