I'm working on PDF with Python and I'm accessing the file's meta data by using PDFMiner
. I extract the info using this:
from pdfminer.pdfparser import PDFParser, PDFDocument
fp = open('diveintopython.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument()
parser.set_document(doc)
doc.set_parser(parser)
doc.initialize()
print doc.info[0]['CreationDate']
# And return this value "D:20130501200439+01'00'"
How can I convert D:20130501200439+01'00'
into a readable format in Python?
Is "+01'00'" the timezone information? Not taking that into account, you can create a datetime object as follows...
>>>from time import mktime, strptime
>>>from datetime import datetime
...
>>>datestring = doc.info[0]['CreationDate'][2:-7]
>>>ts = strptime(datestring, "%Y%m%d%H%M%S")
>>>dt = datetime.fromtimestamp(mktime(ts))
datetime(2013, 5, 1, 20, 4, 30)