Search code examples
javaapache-poidocx4jpoi-hssf

how to modify metadata of a doc document


I'm looking to modify certain tags (like comments, keywords, etc) of a .DOC file. I've been able to do this for DOCX using docx4j but I haven't been able to find anything that lets me change the tags for a .DOC format.

Is there a way to programmatically change the content of certain tags in a .DOC file?


Solution

  • Apache POI will quite happily let you read and edit the metadata of supported documents. For the older OLE2 formats (.doc, .xls etc), you'll want to use HPSF, likely via POIDocument. For the OOXML formats (.docx, .xlsx etc) use POIXMLDocument and POIXMLProperties

    To modify the OLE2 properties, you can either follow the detailed instructions and code in the HPSF documentation, or on newer version of POI you can short cut quite a bit of that with HPSFPropertiesOnlyDocument, eg

    NPOIFSFileSystem fs = new NPOIFSFileSystem(new File("test.doc"));
    HPSFPropertiesOnlyDocument doc = new HPSFPropertiesOnlyDocument(fs);
    
    SummaryInformation si = doc.getSummaryInformation();
    if (si == null) doc.createInformationProperties();
    
    si.setAuthor("StackOverflow");
    si.setTitle("Properties Demo!");
    
    FileOutputStream out = new FileOutputStream("changed.doc");
    doc.write(out);
    out.close();