Search code examples
python-3.xrecursionmultiprocessingnested-loops

Neat way to write several nested for loops and if statements in python.


I am trying to read "n" catalogs/ data files, read 7 columns from each catalog and then check if n*(n-1) "if" statements are true using some of the 7 columns read earlier. If the condition is true, then do some math, else do not do anything.

So for example, if I am comparing two catalogs, then I have 2 "if" statements to test and if I have 3 catalogs then I have 6 "if" statements to check.

Each catalog has roughly 10,000 rows and around 40 columns but their lengths are in general different from each other.

Currently, I have a working code for 3 catalogs where I read the three catalogs as nested for loops and apply my 6 conditions.

Here is an example of my code:

path="xx" #Location of all input files.
cat1 = ascii.read(path + file3, guess=False)
data2 = fits.getdata(path+file2, 1)
cat2 = Table(data2)
cat3 = Table.read(path + 'xyz.tbl', format='ipac')




for i in range(len(cat1)):
    (ra1,dec1,flux1,flux1error,maj1,minor1,ang1)= (cat1['RA_Degrees'][i],
cat1['DEC_Degrees'][i],cat1['fitted_total_flux'][i],
cat1['fitted_total_flux_error'][i],cat1['BMajor_Degrees'][i],
cat1['BMinor_Degrees'][i],cat1['position_angle_deg'][i])
    ang1=ang1*np.pi/180



    for j in range(len(cat2)):
        (ra2,dec2,total_cat2,total_error_cat2,maj2,min2,pa2)= (cat2['ra'][j],cat2['dec'][j],
        cat2['total'][j],cat2['total_err'][j],
        cat2['BMajor'][j],cat2['Bminor'][j],cat2['Position Angle'][j]


        for k in range(len(cat3)):
            (ra3,dec3,total_cat2,total_error_cat2,maj3,min3,pa3)=(cat3['ra'][k],
            cat3['dec'][k],cat3['flux'][k],cat3['ferr'][k],cat3['bmaj'][k],
            cat3['bmin'][k],cat3['pa'][k])

            if np.all(

            np.all(np.abs(ra2-ra1)<  maj1+ maj2 and

            np.all(np.abs(dec2-dec1)< maj1 + maj2) and

            np.all(np.abs(ra3-ra2)<  maj2 + maj3) and

            np.all(np.abs(dec3-dec2)<  maj2 + maj3) and

            np.all(np.abs(ra3-ra1)<  maj1 + maj3) and

            np.all(np.abs(dec3-dec1)<  maj1 + maj3)

               ):

I have two problems related to this:

  1. I would like to generalize this to any number of catalogs. Currently, I have to edit the code if I have 2,3,4 catalogs which is annoying.
  2. A 2 catalog match takes up to 33 minutes to execute, but the 3 catalog match code has been currently running for 2 days. Is there any way to speed this up.

For the first problem, I looked up recursive functions in the link given below but my question is can I use this since my number of conditions to be checked also depends on "n" and the column names are generally not homogeneous across catalogs. For example: one catalog may call Right Ascension as 'RA', another catalog may call it as 'ra' or 'Right Ascension'.

Basics of recursion in Python

For the second problem, I was trying to use multi-processing following the documentation.

https://docs.python.org/2/library/multiprocessing.html

I wanted to know if it is better to stick to nested for loops if I want to do multi processing or try to use recursive function. Any advice would be appreciated.


Solution

  • Look up the itertools package. This will give you some of the basic tools to iterate through lists of columns, with the list length specified as a parameter. Yes, recursion helps solve the combinatorics, but this package will handle the recursion overhead for you.

    The particular concept you want for this application is the combination of the 7 columns, taken n at a time. For sake of illustration, let's consider 7 columns, taken 3 at a time: that's a total of 35 combinations: 7*6*5 / 3*2*1

    What you'll get is a generator, a function that will return each of the 35 combinations, in collating order, one at a time. You can then iterate through that as if it were a list. For each combination, iterate through pairs of columns:

    for col_list in combo_gen:
        for right in range (1, n):
            r_col = col_list[right]
            for left in range(right):
                l_col = col_list[left]
                # Compare l_col and r_col
    

    That's a basic outline of the process. Can you take it from here?