Search code examples
pythonmatplotlibstatisticsastronomy

Clustering in a 3D star-field using Count-in-Cells in Python


First time poster, couldn't find anything that totally solved my issue.

I'm working on a simulation for galactic colonisation for my masters project. A thing I'm trying to do is look a the voids of uncolonised stars left over after the simulation ends and to see if there is clustering behaviour past statistical fluctuations. As it is a monte-carlo numerical problem, a correlation function is no really appropriate so I am using the counts-in-cells method usually employed to look at galaxy clusters.

So I am working in cartesians

data = np.genfromtxt('counts.csv') # positions of uncolonsed stars
x = data[:,0]
y = data[:,1]
z = data[:,2]

What I want to do is use boxes of varying sizes to count the number of stars inside the box and compare to what the mean should be and do statistics on the results.

The direction I'm going in is looking at some sort of 3D histogram such as the bubble plot seen here. I tried this out and it doesn't seem to be binning all my data and I'm not sure why, i.e, the 'floor' of the cube has 'bubbles' but much of the 'roof' has nothing:

3D Bubble Histogram

This is clearly wrong when you look at the plotted raw star field:

Plotted star field

It looks like the bins at higher z values aren't holding any data. This is probably a pretty straightforward problemm but I am in need of some fresh eyes and minds that are better at python than I am.

Can anyone think as to how this is can be fixed? Also I'd like to find a way to count the number of points per box, i.e per bin.

I'm sorry if I'm being a bit dim but I appreciate any help any of you fine fellows can offer me.

Thanks chums!


Solution

  • In the comments you had some alternatives to solve your problem and its difficult to say what is wrong with your code without seeing it. In any case this kind of problem is typically solved by counting data inside a regular grid (which is, nevertheless, a general approach to do an histogram).

    The advantage in building your own grid is that you know immediately where every "sector" is, where it starts and where it ends. As so I would suggest the following approach as an alternative if you want to try it.

    import numpy as np
    from mpl_toolkits.mplot3d import Axes3D
    import matplotlib.pyplot as plt
    
    # Generating some random data.
    data = np.random.randint(0, 100, (1000,3))
    x, y, z = data[:, 0], data[:, 1], data[:, 2]
    
    # Generating raw view
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(x, y, z, marker='+', s=25, c='r')
    plt.show()
    
    # Generating some grid with origin, cell size, and number of cells 10 10 10
    numx, numy, numz = 5, 5, 5
    origx, origy, origz = 0, 0, 0
    sizex, sizey, sizez = 20, 20, 20
    grid = np.vstack(np.meshgrid(range(numx), range(numy), range(numz))).reshape(3, -1).T
    gx, gy, gz = grid[:, 0]*sizex + origx, grid[:, 1]*sizey + origy, grid[:, 2]*sizez + origz
    
    # Calculating the number of stars in each cell:
    ix = ((x - origx)/sizex).astype(int)
    iy = ((y - origy)/sizey).astype(int)
    iz = ((z - origz)/sizez).astype(int)
    s = np.zeros((numx, numy, numz))
    for i in range(ix.shape[0]):
        s[ix[i], iy[i], iz[i]] = s[ix[i], iy[i], iz[i]] + 1
    s = s.flatten()
    mask = s > 0
    
    # Plotting the result
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(gx[mask], gy[mask], gz[mask], marker='o', s=s[mask]*100, c='b', edgecolor ="r")
    plt.show()
    

    The result for randomized data is this:

    bubble histogram in matplotlib