Search code examples
pythondatetimepypdf

pdf.getDocumentInfo date format


I am using pypdf2's function for extracting document info. The results are something like this but I am unable to interpret the creation date format. What are the last few digits representing?

pdf.documentInfo
[Output]: {'/Creator': 'Rave (http://www.nevrona.com/rave)',
           '/Producer': 'Nevrona Designs',
           '/CreationDate': 'D:20060301072826' }

and at one point I also saw this:

CreationDate': "D:20170920114835+02'00'"

how can I read or convert it into a normal date time readable format?


Solution

  • you can clean & parse like

    from datetime import datetime
    
    CreationDate = "D:20170920114835+02'00'"
    
    dt = datetime.strptime(CreationDate.replace("'", ""), "D:%Y%m%d%H%M%S%z")
    
    # UTC offset is set correctly:
    print(dt)
    # 2017-09-20 11:48:35+02:00
    print(repr(dt))
    # datetime.datetime(2017, 9, 20, 11, 48, 35, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200)))
    

    ...which I think is more straight forward than the answer to this related question shows.