Search code examples
pythonpostscript

Extracting text from postscript and/or creating overlays using python


I am trying to extract an address automatically from a postscript document that has been intercepted by redmon and piped to a python program. I have gotten to the point where I can capture the postscript output (and write it to a file), but I am stuck at the extraction part.

Is there a good/reliable way of doing this in python, or do I need to run the postscript file through ps2ascii and hope for the best?

If there are tools in other languages that could do this I would be happy to evaluate them.


Solution

  • Since I commented on ps2ascii haveing a large footprint: here is an "80%" solution to extracting strings that appear literally in a postscript file using python.

    
    import fileinput
    for line in fileinput.input():
     for p in line.replace('\\(','EscapeLP').replace('\\)','EscapeRP').split('(')[1:]:
      print p[:p.find(')')].replace('EscapeLP','(').replace('EscapeRP',')')
    

    Note, finely formatted (kerned) postscript will often have strings split up into small pieces (even individual characters). ps2ascii does a nice job of piecing them together for you while obviously my simple script will not.