calculate cmyk, spot coverage on PDF with python

I don't find any free or open source libraries to calculate CMYK and spot color on pdf. I would be grateful if someone could guide me in the right direction as to what I should do to access color channels and calculate the percentage of color used ( C,M,Y,K and spot, Export each separately ) with Python.

Point: Actually, I don't have a problem with extract C,M,Y,K because I can easily extract it from the image, but the problem is that when I add spot colors, it convert it into cmyk again.

That's why I'm looking for it in PDF.

Thanks

Solution

I hope this will be useful for those who have a similar problem in the future.

Dependencies : Ghostscript - Pillow

How does it work ? Ghostscript will separated colors ( C,M,Y,K, Spots ) and save each one as .tiff and Pillow calculate the percentage of color ( In fact, the file saved by Ghostscript is in grayscale mode and has only one color channel. 0 to 255 ) used on each file.

Point: Before that, make sure you have installed Ghostscript

from PIL import Image
from django.http import HttpResponse
import os , fnmatch

def pdf_color_splitter():

    # Where the photos of separated colors are placed => 
    path = 'image_inputs/'
    if not os.path.exists(path):
        os.makedirs(path)
    
    # now we run ghostscript command for separated colors and save them as tiff files =>
    os.system(f'gs -sDEVICE=tiffsep -o {path}c.tiff  cmyk_calculate/2021.pdf')

    # get all .tiff
    FILES = fnmatch.filter(os.listdir(path), '*.tiff')

    # calculate colors coverage each separately 
    splited_colors = []
    for f in FILES:

        O_FILE = Image.open(path+f)

        image_sizew,image_sizeh = O_FILE.size # get width,height
        count=image_sizeh*image_sizew

        val=0 # Collects colored pixels

        for i in range(0, image_sizew):
            for j in range(1, image_sizeh):
                pixVal = O_FILE.getpixel((i, j))
                if pixVal != 255 and type(pixVal) != tuple: # no white pixels
                val+= 100 - (pixVal//2.55) # Pay attention to the point below this code

        resp = {'name':f.split('.')[0].replace('c(','').replace(')',''),'coverage':val/count}
        split_colors.append(resp)
        os.remove(path+f) # remove .tiff file in the end
        
    return splited_colors

Look to this code

val+= 100 - (pixVal//2.55)

So what was this for?

We want a number in the range of 0 to 100 because we are working in CMYK mode and subtract the answer from 100 because the photo is in Grayscale mode (actually to get the correct color density).