Search code examples
pythonpdfgoogle-docspypdf

Use Python to determine if PDF was generated by Google Docs


I'd like to use Python to tell if a PDF was created by Google Docs. Is there any sort of metadata I can gather with PyPDF2 to determine this?


Solution

  • When doing pdf.getDocumentInfo() on a Document created by Google Docs, it returns {'/Producer': u'Skia/PDF m83'}. I tested this on a few Google docs, and it seems to check out. It makes sense - Skia is a Google project, so must be what they use to generate documents on their backend.

    So you can simply do:

    import PyPDF2
    GOOGLE_DOCS_PDF_METADATA = {'/Producer': u'Skia/PDF m83'}
    
    def file_is_google_doc(pdf_file_path) 
        pdf = PyPDF2.PdfFileReader(pdf_file_path)
        return pdf.getDocumentInfo() == GOOGLE_DOCS_PDF_METADATA