Search code examples
pythondatedatetimepdf

Convert CreationTime of PDF to a readable format in Python


I'm working on PDF with Python and I'm accessing the file's meta data by using PDFMiner. I extract the info using this:

from pdfminer.pdfparser import PDFParser, PDFDocument    
fp = open('diveintopython.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument()
parser.set_document(doc)
doc.set_parser(parser)
doc.initialize()

print doc.info[0]['CreationDate']
# And return this value "D:20130501200439+01'00'"

How can I convert D:20130501200439+01'00' into a readable format in Python?


Solution

  • Is "+01'00'" the timezone information? Not taking that into account, you can create a datetime object as follows...

    >>>from time import mktime, strptime
    >>>from datetime import datetime
    ...
    >>>datestring = doc.info[0]['CreationDate'][2:-7]
    >>>ts = strptime(datestring, "%Y%m%d%H%M%S")
    >>>dt = datetime.fromtimestamp(mktime(ts))
    datetime(2013, 5, 1, 20, 4, 30)