Search code examples
pythonplothistogrampython-ggplot

Why does the histogram I calculate myself look different then the in-build one?


I have a DataFrame, which contains the pixels of a grey images. It has two columns: n which denotes to which image the pixel belongs to and pixel denotes how dark the pixel is. When I print the pixels with

plt.figure()
ggplot(aes(x='pixel'), data=pixelDF) + \
    geom_histogram(binwidth=8) + \
    xlab('pixels') + \
    ylab('') + \
    ggtitle('Histogram of pixels') + \
    scale_y_log() + \
    facet_grid(y='n')

I get enter image description here

but when I transform it first with

def my_historgram(to_histogram):
    histogram = np.histogram(to_histogram, bins=32, range=(0, 255), weights=None, density=False)
    return (histogram)

def get_pixel(df, i):
    return (df.loc[df['n'] == i]['pixel'])

def hist_calc(hist):
    return(np.log(hist) / sum(np.log(hist)))

imageNr = pixelDF['n'].drop_duplicates().tolist() hist, bin_edges = my_historgram(get_pixel(pixelDF, imageNr[0])) histograms = pd.DataFrame({
    'binNr': range(len(hist)),
    'binValue_' + str(imageNr[0]): pd.Series(hist_calc(hist))}).set_index('binNr') for i in imageNr[1:]:
    hist, bin_edges = my_historgram(get_pixel(pixelDF, i))
    histogram = pd.DataFrame({
        'binNr': range(len(hist)),
        'binValue_' + str(i): pd.Series(hist_calc(hist))}).set_index('binNr')
    histograms = histograms.join(histogram) histograms = histograms.reset_index()

### Print new type of Histogram

plt.figure() plotDF = pd.melt(histograms, id_vars=['binNr'], var_name='imageNr', value_name='binValue')
ggplot(aes(x='factor(binNr)', weight='binValue'), data=plotDF) + \
    geom_bar() + \
    xlab('binNr') + \
    ylab('') + \
    ggtitle('Histograms of pixels') + \
    facet_grid(y='imageNr')

I get a pretty different picture:

enter image description here

Why is that? What am I doing wrong in the processing for the second picture?


Solution

  • Thanks to "jeremycg": Who commented " it looks like your version has treated the binNr as a cateogorical variable, and needs to be sorted – jeremycg 2 hours ago"

    The solution is: Simply get rid of factor() in the last ggplot.