Search code examples
python-3.xloopspython-itertoolsastropyfits

Find the closest distance between every galaxy in the data and create pairs based on closest distance between them


My task is to pair up galaxies that are closest together from a large list of galaxies. I have the RA, DEC and Z of each, and a formula to work out the distance between each one from the data given. However, I can't work out an efficient method of iterating over the whole list to find the distance between EACH galaxy and EVERY other galaxy in the list, with the intention of then matching each galaxy with its nearest neighbour.

The data has been imported in the following way:

    hdulist = fits.open("documents/RADECMASSmatch.fits")
    CATAID = data['CATAID_1']
    Xpos_DEIMOS_1 = data['Xpos_DEIMOS_1']
    z = data['Z_1']
    RA = data['RA']
    DEC = data['DEC']

I have tried something like:

    radiff = []
    for i in range(0,n):
        for j in range(i+1,n):
            radiff.append(abs(RA[i]-RA[j]))

to initially work out difference in RA and DEC between every galaxy, which does actually work but I feel like there must be a better way.

A friend suggested something along the lines of:

    galaxy_coords = (data['RA'],data['DEC'],data['Z])
    separation_matrix = np.zeros((len(galaxy_coords),len(galaxy_coords))

    done = []
    for i, coords1 in enumerate(galaxy_coords):
          for j, coords2 in enumerate(galaxy_coords):
                if (j,i) in done:
                    separation_matrix[i,j] += separation matrix[j,i]
                    continue
                    separation = your_formula(coords1, coords2)
                    separation_matrix[i,j] += separation
                    done.append((i,j))

But I don't really understand this so can't readily apply it. I've tried but it yields nothing useful.

Any help with this would be much appreciated, thanks


Solution

  • Your friend's code seems to be generating a 2D array of the distances between each pair, and taking advantage of the symmetry (distance(x,y) = distance(y,x)). It would be slightly better if it used itertools to generate combinations, and assigned your_formula(coords1, coords2) to separation_matrix[i,j] and separation_matrix[j,i] within the same iteration, rather than having separate iterations for both i,j and j,i.

    Even better would probably be this package that uses a tree-based algorithm: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html . It seems to be focused on rectilinear coordinates, but that should be addressable in linear time.