I am currently writing code to remove the Canon CPCA bits from a saved PostScript "print to file" file so that the resulting file is just a pure PostScript file.
I have written the code to remove all bits from the front of the file before the "%!PS-Adobe-3.0" header line, and also code to remove all of the trailing bits after the "%%EOF" line. But in certain larger files, I am seeing some binary code in the middle of the file that I believe I will need to seek out and destroy.
Here's an example of what I am talking about... notice the bit before the header and after the footer: Rumor has it that there is a spec document for the CPCA protocol, but I can't find, even in Canon's developer portal. Can anyone provide any details on the spec so that I can remove ALL of the CPCA data that the spec says may be included?
Thanks in advance for any help.
So, looking at the file there's a bunch of stuff before the %!PS which I assume (as you note in your question) is part of the Canon CPCA stuff.
Then there's the usual comment structure for a DSC-conforming PostScript program. Interestingly, this is then followed by some Canon-specific ProcSet. It seems the Canon driver is not using the usual Windows PostScript generating DLL 'PScrip5.dll' but is instead using some Canon-specific CNS30M.DLL Version 2.40.
This is followed by a huge amount of document setup, then a couple of fairly normal device-specific setpagedevice calls:
%%BeginFeature:
%%+ *PageSize Letter
<</DeferredMediaSelection false
/PageSize [612 792] /ImagingBBox null /Policies << /PageSize 2 >>>> setpagedevice
%%EndFeature
} stopped cleartomark
[{
%%BeginFeature:
%%+ *InputSlot Auto
<</InputAttributes <</Priority []>> >> setpagedevice
%%EndFeature
We then finally move on to the page contents. The first thing the program does is create a CIDFont and load some glyph descriptions into it. I suspect this is the binary you are concerned about. Its legitimate for PostScript and its not part of Cananon CPCA.
The program then draws 4 glyphs from that (subset) font and ejects tha page.
Following that we again have the usual DSC boiler plate stuff, and the %%EOF which is (again as you noted) followed up with some random binary stuff.
Given the description of the Canon CPCA specification, I doubt you will ever find any of it within a PostScript program, I believe it should always wrap around the program, so if you delete everything before %!PS and after %%EOF you should be fine. Note that some workflows can concatenate PostScript programs, which is a bad idea but usually works, you may need to watch out for that.
I tried removing the binary before and after the PostScript program and ran the result, it produced a page reading 'Test'.