Search code examples
pythonimage-processingsignal-processingdftdct

Discrete Cosine Transform (DCT) Coefficient Distribution


I have two images :

Original Image

enter image description here

Binarized Image

enter image description here

I have applied Discrete Cosine Transform to the two images by dividing the 256x256 image into 8x8 blocks. After, I want to compare their DCT Coefficient Distributions.

import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import numpy as np
import os.path
import scipy
import statistics

from numpy import pi
from numpy import sin
from numpy import zeros
from numpy import r_
from PIL import Image
from scipy.fftpack import fft, dct
from scipy import signal
from scipy import misc


if __name__ == '__main__':
    image_counter = 1

    #Opens the noisy image.
    noise_image_path = 'noise_images/' + str(image_counter) + '.png'
    noise_image = Image.open(noise_image_path)

    # Opens the binarize image
    ground_truth_image_path = 'ground_truth_noise_patches/' + str(image_counter) + '.png'
    ground_truth_image = Image.open( ground_truth_image_path)

    #Converts the images into Ndarray
    noise_image = np.array(noise_image)
    ground_truth_image = np.array(ground_truth_image)

    #Create variables `noise_dct_data` and `ground_truth_dct_data` where the DCT coefficients of the two images will be stored.
    noise_image_size = noise_image.shape
    noise_dct_data = np.zeros(noise_image_size)      
    ground_truth_image_size = ground_truth_image.shape
    ground_truth_dct_data = np.zeros(ground_truth_image_size)

    for i in r_[:noise_image_size[0]:8]:
        for j in r_[:noise_image_size[1]:8]:   
            # Apply DCT to the two images every 8x8 block of it.             
            noise_dct_data[i:(i+8),j:(j+8)] = dct(noise_image[i:(i+8),j:(j+8)])
            # Apply DCT to the binarize image every 8x8 block of it.   
            ground_truth_dct_data[i:(i+8),j:(j+8)] = dct(ground_truth_image[i:(i+8),j:(j+8)])

The above code gets the DCT of the two images. I want to create their DCT Coefficient Distribution just like the image below:

enter image description here

The thing is I dont know how to plot it. Below is what I did:

    #Convert 2D array to 1D array        
    noise_dct_data = noise_dct_data.ravel()   
    ground_truth_dct_data = ground_truth_dct_data.ravel()       

    #I just used a Histogram!
    n, bins, patches = plt.hist(ground_truth_dct_data, 2000, facecolor='blue', alpha=0.5)
    plt.show()

    n, bins, patches = plt.hist(noise_dct_data, 2000, facecolor='blue', alpha=0.5)
    plt.show()

    image_counter = image_counter + 1

My questions are:

  1. What does the X and Y-axis in the figure represents?
  2. Are the value stored in noise_dct_data and ground_truth_dct_data, the DCT coefficients?
  3. Does the Y-axis represents the frequncy of its corresponding DCT coefficients?
  4. Is the histogram appropriate to represent the DCT coefficient distribution.
  5. The DCT coefficients are normally classified into three sub-bands based on their frequencies, namely low, middle and high frequency-bands. What is the threshold value we can use to classify a DCT Coefficient in low, middle or high frequency band? In other words, how can we classify the DCT coefficient frequency bands radially? Below is an example of the radial classification of the DCT coefficient frequency bands.

enter image description here

The idea is based from the paper : Noise Characterization in Ancient Document Images Based on DCT Coefficient Distribution


Solution

  • The plot example you shared looks, to me, like a kernel density plot. A density plot "a variation of a Histogram that uses kernel smoothing to plot values, allowing for smoother distributions by smoothing out the noise." (See https://datavizcatalogue.com/methods/density_plot.html)

    The seaborn library, which is built on top of matplotlib, has a kdeplot function, and it can handle two sets of data. Here's a toy example:

    import numpy as np 
    from scipy.fftpack import dct
    import seaborn 
    
    sample1 = dct(np.random.rand(100))
    sample2 = dct(np.random.rand(30))
    seaborn.kdeplot(sample1, color="r")
    seaborn.kdeplot(sample2, color="b")
    

    kdeplot example

    Note that rerunning this code will produce a slightly different image, as I'm using randomly generated data.

    To answer your numbered questions directly:

    1. What do the X- and Y-axes in the figure represent?

    In a kdeplot, the X axis represents the density, and the y axis represents the number of observations with those values. Unlike a histogram, it applies a smoothing method to try and estimate a "true" distribution of data behind the noisy observed data.

    2. Are the value stored in noise_dct_data and ground_truth_dct_data, the DCT coefficients?

    Based on the way you've set up your code, yes, those variables stored the result of the DCT transformations you do.

    3. Does the Y-axis represents the frequency of its corresponding DCT coefficients?

    Yes, but with smoothing. Analogous to a histogram but not exactly the same.

    4. Is the histogram appropriate to represent the DCT coefficient distribution?

    It depends on the number of observations but, if you have enough data, a histogram should give you very similar results.

    5. The DCT coefficients are normally classified into three sub-bands based on their frequencies, namely low, middle and high frequency-bands. What is the threshold value we can use to classify a DCT Coefficient in low, middle or high frequency band? In other words, how can we classify the DCT coefficient frequency bands radially?

    I think this question is possibly too complicated to answer satisfactorily on stack, but my advice here to is try and figure out how the authors of the article did this task. The cited article, "Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain" appears to be talking about a Radial Basis Function (RBF), but this looks like a way of training a supervised model on the frequency data to predict the overall quality of the scan.

    Regarding data partitions, they state, "In order to capture directional information from the local image patches, the DCT block is partitioned directionally. ... The upper, middle, and lower partitions correspond to the low-frequency, mid-frequency, and high-frequency DCT subbands, respectively."

    I take this to me that, in at least one of their scenarios, the partitions are determined by a Subband DCT. (See https://ieeexplore.ieee.org/document/499836) There appears to be a great deal of literature on these types of approaches.