Search code examples
pdfitextpdfbox

PDF metadata removal using Java


How to remove metadata on PDF using Java?

Is IText will do or any other frameworks have ability to do this? I didn't find any examples or Classes which will remove metadata using IText. If anybody done this before or any ideas?

Please share your views.

Thanks in advance.


Solution

  • First you need to differentiate since there are two types of metadata in the PDF:

    1. XMP meta data
    2. DID (document information dictionary, the old way)

    The first you remove like the following:

    PdfReader reader = stamper.getReader();
    reader.getCatalog().remove(PdfName.METADATA);
    reader.removeUnusedObjects();
    

    The 2nd you remove like SANN3 has mentioned:

    HashMap<String, String> info = super.reader.getInfo();
    info.put("Title", null);
    info.put("Author", null);
    info.put("Subject", null);
    info.put("Keywords", null);
    info.put("Creator", null);
    info.put("Producer", null;
    info.put("CreationDate", null);
    info.put("ModDate", null);
    info.put("Trapped", null);
    stamper.setMoreInfo(info);
    

    If you then search the PDF with a text editor you won't find the /INFO dictionary nor XMP meta data...