I am trying to extract an address automatically from a postscript document that has been intercepted by redmon and piped to a python program. I have gotten to the point where I can capture the postscript output (and write it to a file), but I am stuck at the extraction part.
Is there a good/reliable way of doing this in python, or do I need to run the postscript file through ps2ascii and hope for the best?
If there are tools in other languages that could do this I would be happy to evaluate them.
Since I commented on ps2ascii haveing a large footprint: here is an "80%" solution to extracting strings that appear literally in a postscript file using python.
import fileinput
for line in fileinput.input():
for p in line.replace('\\(','EscapeLP').replace('\\)','EscapeRP').split('(')[1:]:
print p[:p.find(')')].replace('EscapeLP','(').replace('EscapeRP',')')
Note, finely formatted (kerned) postscript will often have strings split up into small pieces (even individual characters). ps2ascii does a nice job of piecing them together for you while obviously my simple script will not.