python-3.x recursion multiprocessing nested-loops

Neat way to write several nested for loops and if statements in python.

I am trying to read "n" catalogs/ data files, read 7 columns from each catalog and then check if n*(n-1) "if" statements are true using some of the 7 columns read earlier. If the condition is true, then do some math, else do not do anything.

So for example, if I am comparing two catalogs, then I have 2 "if" statements to test and if I have 3 catalogs then I have 6 "if" statements to check.

Each catalog has roughly 10,000 rows and around 40 columns but their lengths are in general different from each other.

Currently, I have a working code for 3 catalogs where I read the three catalogs as nested for loops and apply my 6 conditions.

Here is an example of my code:

path="xx" #Location of all input files.
cat1 = ascii.read(path + file3, guess=False)
data2 = fits.getdata(path+file2, 1)
cat2 = Table(data2)
cat3 = Table.read(path + 'xyz.tbl', format='ipac')




for i in range(len(cat1)):
    (ra1,dec1,flux1,flux1error,maj1,minor1,ang1)= (cat1['RA_Degrees'][i],
cat1['DEC_Degrees'][i],cat1['fitted_total_flux'][i],
cat1['fitted_total_flux_error'][i],cat1['BMajor_Degrees'][i],
cat1['BMinor_Degrees'][i],cat1['position_angle_deg'][i])
    ang1=ang1*np.pi/180



    for j in range(len(cat2)):
        (ra2,dec2,total_cat2,total_error_cat2,maj2,min2,pa2)= (cat2['ra'][j],cat2['dec'][j],
        cat2['total'][j],cat2['total_err'][j],
        cat2['BMajor'][j],cat2['Bminor'][j],cat2['Position Angle'][j]


        for k in range(len(cat3)):
            (ra3,dec3,total_cat2,total_error_cat2,maj3,min3,pa3)=(cat3['ra'][k],
            cat3['dec'][k],cat3['flux'][k],cat3['ferr'][k],cat3['bmaj'][k],
            cat3['bmin'][k],cat3['pa'][k])

            if np.all(

            np.all(np.abs(ra2-ra1)<  maj1+ maj2 and

            np.all(np.abs(dec2-dec1)< maj1 + maj2) and

            np.all(np.abs(ra3-ra2)<  maj2 + maj3) and

            np.all(np.abs(dec3-dec2)<  maj2 + maj3) and

            np.all(np.abs(ra3-ra1)<  maj1 + maj3) and

            np.all(np.abs(dec3-dec1)<  maj1 + maj3)

               ):

I have two problems related to this:

I would like to generalize this to any number of catalogs. Currently, I have to edit the code if I have 2,3,4 catalogs which is annoying.
A 2 catalog match takes up to 33 minutes to execute, but the 3 catalog match code has been currently running for 2 days. Is there any way to speed this up.

For the first problem, I looked up recursive functions in the link given below but my question is can I use this since my number of conditions to be checked also depends on "n" and the column names are generally not homogeneous across catalogs. For example: one catalog may call Right Ascension as 'RA', another catalog may call it as 'ra' or 'Right Ascension'.

Basics of recursion in Python

For the second problem, I was trying to use multi-processing following the documentation.

https://docs.python.org/2/library/multiprocessing.html

I wanted to know if it is better to stick to nested for loops if I want to do multi processing or try to use recursive function. Any advice would be appreciated.

Solution

Look up the itertools package. This will give you some of the basic tools to iterate through lists of columns, with the list length specified as a parameter. Yes, recursion helps solve the combinatorics, but this package will handle the recursion overhead for you.

The particular concept you want for this application is the combination of the 7 columns, taken n at a time. For sake of illustration, let's consider 7 columns, taken 3 at a time: that's a total of 35 combinations: 7*6*5 / 3*2*1

What you'll get is a generator, a function that will return each of the 35 combinations, in collating order, one at a time. You can then iterate through that as if it were a list. For each combination, iterate through pairs of columns:

for col_list in combo_gen:
    for right in range (1, n):
        r_col = col_list[right]
        for left in range(right):
            l_col = col_list[left]
            # Compare l_col and r_col

That's a basic outline of the process. Can you take it from here?