Search code examples
postscript

Removing CPCA Bits from a PostScript file


I am currently writing code to remove the Canon CPCA bits from a saved PostScript "print to file" file so that the resulting file is just a pure PostScript file.

I have written the code to remove all bits from the front of the file before the "%!PS-Adobe-3.0" header line, and also code to remove all of the trailing bits after the "%%EOF" line. But in certain larger files, I am seeing some binary code in the middle of the file that I believe I will need to seek out and destroy.

Here's an example of what I am talking about... notice the bit before the header and after the footer: enter image description here enter image description here Rumor has it that there is a spec document for the CPCA protocol, but I can't find, even in Canon's developer portal. Can anyone provide any details on the spec so that I can remove ALL of the CPCA data that the spec says may be included?

Thanks in advance for any help.


Solution

  • So, looking at the file there's a bunch of stuff before the %!PS which I assume (as you note in your question) is part of the Canon CPCA stuff.

    Then there's the usual comment structure for a DSC-conforming PostScript program. Interestingly, this is then followed by some Canon-specific ProcSet. It seems the Canon driver is not using the usual Windows PostScript generating DLL 'PScrip5.dll' but is instead using some Canon-specific CNS30M.DLL Version 2.40.

    This is followed by a huge amount of document setup, then a couple of fairly normal device-specific setpagedevice calls:

    %%BeginFeature: 
    %%+ *PageSize Letter
    <</DeferredMediaSelection false
    /PageSize [612 792] /ImagingBBox null /Policies << /PageSize 2 >>>> setpagedevice
    %%EndFeature 
    } stopped cleartomark
    [{
    %%BeginFeature: 
    %%+ *InputSlot Auto
    <</InputAttributes <</Priority []>> >> setpagedevice
    %%EndFeature 
    

    We then finally move on to the page contents. The first thing the program does is create a CIDFont and load some glyph descriptions into it. I suspect this is the binary you are concerned about. Its legitimate for PostScript and its not part of Cananon CPCA.

    The program then draws 4 glyphs from that (subset) font and ejects tha page.

    Following that we again have the usual DSC boiler plate stuff, and the %%EOF which is (again as you noted) followed up with some random binary stuff.

    Given the description of the Canon CPCA specification, I doubt you will ever find any of it within a PostScript program, I believe it should always wrap around the program, so if you delete everything before %!PS and after %%EOF you should be fine. Note that some workflows can concatenate PostScript programs, which is a bad idea but usually works, you may need to watch out for that.

    I tried removing the binary before and after the PostScript program and ran the result, it produced a page reading 'Test'.