Extracting text from postscript and/or creating overlays using python

I am trying to extract an address automatically from a postscript document that has been intercepted by redmon and piped to a python program. I have gotten to the point where I can capture the postscript output (and write it to a file), but I am stuck at the extraction part.

Is there a good/reliable way of doing this in python, or do I need to run the postscript file through ps2ascii and hope for the best?

If there are tools in other languages that could do this I would be happy to evaluate them.

Solution

Since I commented on ps2ascii haveing a large footprint: here is an "80%" solution to extracting strings that appear literally in a postscript file using python.


import fileinput
for line in fileinput.input():
 for p in line.replace('\\(','EscapeLP').replace('\\)','EscapeRP').split('(')[1:]:
  print p[:p.find(')')].replace('EscapeLP','(').replace('EscapeRP',')')

Note, finely formatted (kerned) postscript will often have strings split up into small pieces (even individual characters). ps2ascii does a nice job of piecing them together for you while obviously my simple script will not.