I am trying to read "n" catalogs/ data files, read 7 columns from each catalog and then check if n*(n-1) "if" statements are true using some of the 7 columns read earlier. If the condition is true, then do some math, else do not do anything.
So for example, if I am comparing two catalogs, then I have 2 "if" statements to test and if I have 3 catalogs then I have 6 "if" statements to check.
Each catalog has roughly 10,000 rows and around 40 columns but their lengths are in general different from each other.
Currently, I have a working code for 3 catalogs where I read the three catalogs as nested for loops and apply my 6 conditions.
Here is an example of my code:
path="xx" #Location of all input files.
cat1 = ascii.read(path + file3, guess=False)
data2 = fits.getdata(path+file2, 1)
cat2 = Table(data2)
cat3 = Table.read(path + 'xyz.tbl', format='ipac')
for i in range(len(cat1)):
(ra1,dec1,flux1,flux1error,maj1,minor1,ang1)= (cat1['RA_Degrees'][i],
cat1['DEC_Degrees'][i],cat1['fitted_total_flux'][i],
cat1['fitted_total_flux_error'][i],cat1['BMajor_Degrees'][i],
cat1['BMinor_Degrees'][i],cat1['position_angle_deg'][i])
ang1=ang1*np.pi/180
for j in range(len(cat2)):
(ra2,dec2,total_cat2,total_error_cat2,maj2,min2,pa2)= (cat2['ra'][j],cat2['dec'][j],
cat2['total'][j],cat2['total_err'][j],
cat2['BMajor'][j],cat2['Bminor'][j],cat2['Position Angle'][j]
for k in range(len(cat3)):
(ra3,dec3,total_cat2,total_error_cat2,maj3,min3,pa3)=(cat3['ra'][k],
cat3['dec'][k],cat3['flux'][k],cat3['ferr'][k],cat3['bmaj'][k],
cat3['bmin'][k],cat3['pa'][k])
if np.all(
np.all(np.abs(ra2-ra1)< maj1+ maj2 and
np.all(np.abs(dec2-dec1)< maj1 + maj2) and
np.all(np.abs(ra3-ra2)< maj2 + maj3) and
np.all(np.abs(dec3-dec2)< maj2 + maj3) and
np.all(np.abs(ra3-ra1)< maj1 + maj3) and
np.all(np.abs(dec3-dec1)< maj1 + maj3)
):
I have two problems related to this:
For the first problem, I looked up recursive functions in the link given below but my question is can I use this since my number of conditions to be checked also depends on "n" and the column names are generally not homogeneous across catalogs. For example: one catalog may call Right Ascension as 'RA', another catalog may call it as 'ra' or 'Right Ascension'.
For the second problem, I was trying to use multi-processing following the documentation.
https://docs.python.org/2/library/multiprocessing.html
I wanted to know if it is better to stick to nested for loops if I want to do multi processing or try to use recursive function. Any advice would be appreciated.
Look up the itertools package. This will give you some of the basic tools to iterate through lists of columns, with the list length specified as a parameter. Yes, recursion helps solve the combinatorics, but this package will handle the recursion overhead for you.
The particular concept you want for this application is the combination of the 7 columns, taken n at a time. For sake of illustration, let's consider 7 columns, taken 3 at a time: that's a total of 35 combinations: 7*6*5 / 3*2*1
What you'll get is a generator, a function that will return each of the 35 combinations, in collating order, one at a time. You can then iterate through that as if it were a list. For each combination, iterate through pairs of columns:
for col_list in combo_gen:
for right in range (1, n):
r_col = col_list[right]
for left in range(right):
l_col = col_list[left]
# Compare l_col and r_col
That's a basic outline of the process. Can you take it from here?