Search code examples
pythonimagepixelparticlesastronomy

Algorithm to pixelate positions and fluxes


What would you do if you had n particles on a plane (with positions (x_n,y_n)), with a certain flux flux_n, and you have to pixelate these particles, so you have to go from (x,y) to (pixel_i, pixel_j) space and you have to sum up the flux of the m particles which fall in to every single pixel? Any suggestions? Thank you!


Solution

  • The are several ways with which you can solve your problem.

    Assumptions: your positions have been stored into two numpy array of shape (N, ), i.e. the position x_n (or y_n) for n in [0, N), let's call them x and y. The flux is stored into a numpy array with the same shape, fluxes.

    1 - INTENSIVE CASE Create something that looks like a grid:

    #get minimums and maximums position
    mins = int(x.min()), int(y.min())
    maxs = int(x.max()), int(y.max())
    #actually you can also add and subtract 1 or more unit 
    #in order to have a grid larger than the x, y extremes
    #something like mins-=epsilon and maxs += epsilon
    
    #create the grid
    xx = np.arange(mins[0], maxs[0])
    yy = np.arange(mins[1], maxs[1])
    

    Now you can perform a double for loop, tacking, each time, two consecutive elements of xx and yy, to do this, you can simple take:

    x1 = xx[:-1] #excluding the last element
    x2 = xx[1:]  #excluding the first element
    
    #the same for y:
    y1 = yy[:-1] #excluding the last element
    y2 = yy[1:]  #excluding the first element
    
    fluxes_grid = np.zeros((xx.shape[0], yy.shape[0]))
    for i, (x1_i, x2_i) in enumerate(zip(x1, x2)):
        for j, (y1_j, y2_j) in enumerate(zip(y1, y2)):
            idx = np.where((x>=x1_i) & (x<x2_i) & (y>=y1_j) & (y<y2_j))[0]
            fluxes_grid[i,j] = np.sum(fluxes[idx])
    

    At the end of this loop you have a grid whose elements are pixels representing the sum of fluxes.

    2 - USING A QUANTIZATION ALGORITHM LIKE K-NN

    What happen if you have a lot o points, so many that the loop takes hours? A faster solution is to use a quantization method, like K Nearest Neighbor, KNN on a rigid grid. There are many way to run a KNN (included already implemented version, e.g. sklearn KNN). But this is vary efficient if you can take advantage of a GPU. For example this my tensorflow (vs 2.1) implementation. After you have defined a squared grid:

    _min, maxs = min(mins), max(maxs)
    xx = np.arange(_min, _max)
    yy = np.arange(_min, _max)
    

    You can build the matrix, grid, and your position matrix, X: grid = np.column_stack([xx, yy]) X = np.column_stack([x, y])

    then you have to define a matrix euclidean pairwise-distance function:

    @tf.function
    def pairwise_dist(A, B):  
        # squared norms of each row in A and B
        na = tf.reduce_sum(tf.square(A), 1)
        nb = tf.reduce_sum(tf.square(B), 1)
        # na as a row and nb as a co"lumn vectors
        na = tf.reshape(na, [-1, 1])
        nb = tf.reshape(nb, [1, -1])
        # return pairwise euclidead difference matrix
        D = tf.sqrt(tf.maximum(na - 2*tf.matmul(A, B, False, True) + nb, 0.0))
        return D
    

    Thus:

    #compute the pairwise distances:
    D = pairwise_dist(grid, X)
    D = D.numpy() #get a numpy matrix from a tf tensor
    #D has shape M, N, where M is the number of points in the grid and N the number of positions.
    #now take a rank and from this the best K (e.g. 10)
    ranks = np.argsort(D, axis=1)[:, :10]
    #for each point in the grid you have the nearest ten.
    

    Now you have to take the fluxes corresponding to this 10 positions and sum on them.

    I had avoid to further specify this second method, I don't know the dimension of your catalogue, if you have or not a GPU or if you want to use such kind of optimization. If you want I can improve this explanation, only if you are interested.