Search code examples
linuxpdfimagemagickghostscriptcmyk

Converting PDF to CMYK (with identify recognizing CMYK)


I am having much trouble to get ImageMagick's identify to, well, identify a PDF as CMYK.

Essentially, let's say I'm building this file, test.tex, with pdflatex:

\documentclass[a4paper,12pt]{article}

%% https://tex.stackexchange.com/questions/13071
\pdfcompresslevel=0

%% http://compgroups.net/comp.text.tex/Making-a-cmyk-PDF
%% ln -s /usr/share/color/icc/sRGB.icm .
% \immediate\pdfobj stream attr{/N 4} file{sRGB.icm}
% \pdfcatalog{%
% /OutputIntents [ <<
% /Type /OutputIntent
% /S/GTS_PDFA1
% /DestOutputProfile \the\pdflastobj\space 0 R
% /OutputConditionIdentifier (sRGB IEC61966-2.1)
% /Info(sRGB IEC61966-2.1)
% >> ]
% }

%% http://latex-my.blogspot.com/2010/02/cmyk-output-for-commercial-printing.html
%% https://tex.stackexchange.com/questions/9961
\usepackage[cmyk]{xcolor}

\begin{document}
Some text here...
\end{document}

If I then try to identify the resulting test.pdf file, I get it as RGB, no matter what options I've tried (at least according to the links in the source) - and yet, the colors in it would be saved as CMYK; for the source above:

$ grep -ia 'cmyk\|rgb\| k' test.pdf 
0 0 0 1 k 0 0 0 1 K
0 0 0 1 k 0 0 0 1 K
0 0 0 1 k 0 0 0 1 K
0 0 0 1 k 0 0 0 1 K
FontDirectory/CMR12 known{/CMR12 findfont dup/UniqueID known{dup
/PTEX.Fullbanner (This is pdfTeX, Version 3.1415926-1.40.11-2.2 (TeX Live 2010) kpathsea version 6.0.0)

$ identify -verbose 'test.pdf[0]'
...
  Type: Palette
  Endianess: Undefined
  Colorspace: RGB
  Depth: 16/8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
  Channel statistics:
    Red:
...
    Green:
...
    Blue:
...
  Histogram:
         5: (12593,11565,11822) #31312D2D2E2E rgb(49,45,46)
         4: (16448,15420,15677) #40403C3C3D3D rgb(64,60,61)
         9: (20303,19275,19532) #4F4F4B4B4C4C rgb(79,75,76)
        25: (23901,23130,23387) #5D5D5A5A5B5B rgb(93,90,91)
...

The same pretty much happens if I also uncomment that \immediate\pdfobj stream ... part; and yet, if there is only one color (black) in the document, I don't see where does identify come up with a histogram of RGB values (although, arguably, all of them close to gray) ?!

 

So nevermind this, then I though I'd better try to use ghostscript to convert the test.pdf into a new pdf, which would be recognized as CMYK by identify - but no luck even there:

$ gs -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pdfwrite  -sOutputFile=test-gs.pdf -dUseCIEColor -sProcessColorModel=DeviceRGB -dProcessColorModel=/DeviceCMYK -sColorConversionStrategy=/CMYK test.pdf 

GPL Ghostscript 9.01 (2011-02-07)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1


$ identify -verbose 'test-gs.pdf[0]'
...
  Type: Grayscale
  Base type: Grayscale
  Endianess: Undefined
  Colorspace: RGB
  Depth: 16/8-bit
...

So the only thing that identify perceived as a change, is Type: Grayscale (from previous Type: Palette); but otherwise it still sees an RGB colorspace!

Along with this, note that identify is capable of correctly reporting a CMYK pdf - see CMYK poster example: fitting pdf page size to (bitmap) image size? #17843 - TeX - LaTeX - Stack Exchange for a command line example of generating such a PDF file using convert and gs. In fact, we can execute:

convert test.pdf -depth 8 -colorspace cmyk -alpha Off test-c.pdf

... and this will result with a PDF that will be identifyed as CMYK - however, the PDF will also be rasterized (default at 72 dpi).

EDIT: I have just discovered, that if I create an .odp presentation in OpenOffice, and export it to PDF; that PDF will by default be RGB, however, the following command (from ghostscript Examples | Production Monkeys):

# Color PDF to CMYK:
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite \
-sColorConversionStrategy=CMYK -dProcessColorModel=/DeviceCMYK \
-sOutputFile=output.pdf input.pdf

... actually will produce a CMYK pdf, reported as such by identify (although, the black will be rich, not plain - on all four channels); however, this command will work only when the slide has an added image (apparently, it is the one triggering the color conversion?!)! Funnily, I cannot get the same effect from a pdflatex PDF.

 

So I guess my question can be asked two ways:

  • Are there any command-line conversion methods in Linux, that will convert an RGB pdf into a CMYK pdf while preserving vectors, which is recognized as such in identify (and will consequently build a correct histogram of CMYK colors)
  • Are there any other command-line Linux tools similar to identify, which would recognize use of CMYK colors correctly even in the original test.pdf from pdflatex (and possibly build a color histogram, based on an arbitrarily chosen PDF page, like identify is supposed to)?

Thanks in advance for any answers,
Cheers!

 

Some references:


Solution

  • sdaau, the command you used for trying to convert your PDF to CMYK was not correct. Try this one instead:

     gs \
       -o test-cmyk.pdf \
       -sDEVICE=pdfwrite \
       -sProcessColorModel=DeviceCMYK \
       -sColorConversionStrategy=CMYK \
       -sColorConversionStrategyForImages=CMYK \
        test.pdf 
    

    Update

    If color conversion does not work as desired and if you see a message like "Unable to convert color space to Gray, reverting strategy to LeaveColorUnchanged" then...

    1. your Ghostscript probably is a newer release from the 9.x version series, and
    2. your source PDF likely uses an embedded ICC color profile

    In this case add -dOverrideICC to the command line and see if it changes the result as desired.


    Update 2

    To avoid JPEG artifacts appearing in the images (where there were none before), add:

    -dEncodeColorImages=false
    

    into the command line.

    (This is true for almost all GS PDF->PDF processing, not just for this case. Because GS by default creates a completely new file with newly constructed objects and a new file structure when asked to produce PDF output -- it doesn't simply re-use the previous objects, as a more "dumb" PDF processor like pdftk does {pdftk has other advantages though, don't misunderstand my statement!}. GS applies JPEG compression by default -- look at the current Ps2pdf documentation and search for "ColorImageFilter" to learn about more details...)