Search code examples
pdfghostscriptmupdf

Remove all vector paths from PDF


I'm looking for a way to remove all path objects from PDF file.

I suspect that this can probably be done with javascript in Adobe Acrobat, but would really appreciate a tip to do it with ghostscript or mupdf tools.

Anyhow any working solution is acceptable as correct answer


Solution

  • To do this with Ghostscript you would have to modify the pdfwrite device. In fact you would probably have to do something similar for any PDF interpreter.

    What do you consider a 'path' object ? A shfill for example ? How about text ? How about text using a type 3 font (which constructs paths) ?

    What about clip paths ?

    If you really want to pursue this I can tell you where to modify pdfwrite, provided you don't mind recompiling Ghostscript.

    Its probably a dumb question, but why do you want to do this ? Is it possible there might be another solution to your problem ? If all you want to do is remove filled paths (or indeed stroked paths. One solution would be to run the file through ps2write to get PostScript, prepend code to redefine 'fill' and 'stroke' as no-ops, and then run the file back through pdfwrite to get a PDF.

    [Added after reading comments]

    PDF doesn't have a 'path' object, unlike XObject which is a type of object. Paths are created by a series of operations such as 'newpath', 'moveto', 'curveto' and 'lineto'. Once you have built a path you then operate on it with 'fill' or 'stroke. Note that PDF also doesn't have a 'text' object type either.

    This is why your approach doesn't work, you can't remove 'path objects' because there aren't any, the paths are created in the content stream. You can use a Form XObject to do something similar, but then the paths construction is in the Form content stream, it still isn't a separate object.

    The same is true of PostScript, these are NOT any kind of object oriented languages. You cannot ' detect vector object of type path' in either language because there are no objects. In practice anything which isn't an image is a vector object, and is constructed from a path (and with clipping, even some images might be considered as paths)

    The piece of PostScript you have highlighted adds a rectangle to a path (paths need not be contiguous in either PDF or PostScript) and then fills it. Note that, as is usual practice in PostScript, these are not directly using the PostScript operators, but are executing procedures which use the operators. The procedures are defined in the program prologue.

    By the way, it looks like you used the pswrite device here (can't be sure with such a small sample). If this is the case you really want to start with ps2write instead. Otherwise you are going to end up with an awful lot of things degenerating to tiny filled rectangles (pswrite does this with many image types)

    I didn't suggest that you try to 'decrypt' the ps2write output (it isn't encrypted, its compressed).

    What I suggested was to create a PostScript file, redefine the 'show' and/or 'fill' operators so that they do nothing, and then run the resulting PostScript program back through Ghostscript using the pdfwrite device. This will produce a PDF file where all stroked and/or filled objects are ignored.

    [final addition]

    I picked up your sample file and examined it.

    I presume the bug you are seeing is that the PDF file uses a /Separation colour (surely it cannot fail to fill a rectangle) with an ICCBased alternate and no device space tint transfrom. In that case the current version of ps2write may solve your problem. It (currently, this is due to change) does not preserve /Separation colours and instead emits them as a device colour, by default RGB. So simply converting the files to PostScript and back to PDF may completely resolve your problem.

    If you knew what the problem was, it would have been quicker if you had told us, I could have given you that information and work-around in the first place.

    Using ps2write I then created a PostScript version of the file (notice that the Separation colours are now RGB) and prefixed the PostScript program with two lines:

    /fill {newpath} bind def
    /stroke {newpath} bind def
    

    Note that you must use an editor which preserves binary. Then running that PostScript program back through Ghostscript using the pdfwrite device I obtain a PDF file where the green 'decoration' which I think you are having a problem with is gone.

    So, there's a solution to your question, and a possibly better way to solve your problem as well.