Search code examples
javadetectionpassword-protectionapache-tika

Check the document password protected or not


I am using Apache-Tika to read and write the documents. So that I can get the advantage of reading both PDF and Microsoft documents.

I want to check whether the document password protected or not before proceeding. Is there any explicit method to do this?


Solution

  • No, there is no way to check in advance. This is because Tika won't know the file is password protected until it gets a fair way through processing it

    If you know the password for a file, you should use logic something like:

    ParseContext context = new ParseContext();
    context.set(PasswordProvider.class, new PasswordProvider() {
         public String getPassword(Metadata metadata) {
            return "password";
        }
    });
    parser.parse(stream, handler, metadata, context);
    

    Otherwise, if you don't know the password and you're going to prompt for it, provide a PasswordProvider class which will take the Metadata object and prompt / lookup based on that.

    The other option is to try to parse, catch EncryptedDocumentException and then re-parse with a PasswordProvider if available. Generally a EncryptedDocumentException gets thrown pretty early, so the overhead isn't too high