Search code examples
juliahistogram2d

Return the frequency in a bin of a 2D histogram in Julia


Suppose I have some 2D data points, and using the Plots package in Julia, a 2D histogram can be easily plotted. My task is to define a function that maps between a data point to the frequency of data points of the bin to which that point belongs to. Are there any functions that serve well for this task?

For example, as in the following 2D histogram: hist

And I would like to define a function, such that when I input an arbitrary data points that is within the domain of this histogram, the function will output the frequency of the corresponding bin. In the image above, when I input (0.1, 0.1), the function should output, say, 375 (I suppose the brightest grid there represents the frequency of 375). Are there any convenient functions in Julia to achieve the aforementioned task?

Edit:

using Plots
gr()
histogram2d(randn(10000), randn(10000), nbins=20)

A histogram is created from 10000 2D data points generated from standard normal distribution. Is there any function in Julia to input a 2D point and output the frequency of the bin to which the point belongs to? It is possible to write one myself by creating arrays and bins and counting the number of elements in the bin of an inputted data point but this will be the tedious way.


Solution

  • I'm not 100% sure whether this is what StatsPlots is doing, but one approach could be to use StatsBase's histogram which works for N dimensions:

    using StatsBase, StatsPlots, Distributions
    
    # Example data 
    data = (randn(10_000), randn(10_000))
    
    # Plot StatsPlots 2D histogram
    histogram2d(data)
    
    # Fit a histogram with StatsBase
    h = fit(Histogram, data)
    x = searchsortedfirst(h.edges[1], 0.1)  # returns 10
    y = searchsortedfirst(h.edges[2], 0.1)  # returns 11
    h.weights[x, y] # returns 243
    
    # Or as a function
    function get_freq(h, xval, yval)
        x = searchsortedfirst(h.edges[1], xval)
        y = searchsortedfirst(h.edges[2], yval)
        h.weights[x, y]
    end
    
    get_freq(h, 1.4, 0.6) # returns 32