Search code examples
pythonfitspyfits

Aggregating table rows by values in multiple columns in a FITS file


I have a FITS file with many columns. A part of a simplified example of the file looks like this:

    A      B      C
   100     1      90
   100     2      90
   100     3      90
   100     4      90
   211     40     70
   211     41     70
   211     42     70
   211     43     70
   211     44     70

If you notice here, the first four rows of column A and C are the same, but column B varies from 1 to 4. Then the next 5 values of column A and C are the same, but column B varies from 40 to 44.

What I would like to do is, write a program that creates a file like this:

   A      B     C
  100     4     90
  211     5     70

That is, column B should contain the number of values for which column A and C were the same!

I would like to know how do to this in Python. It does not necessarily need to deal with a FITS file, I can also convert it to ASCII format if there are some routines that cannot be used in FITS files.

What I have tried so far:

I bumped into the routine called Collections which has a sub-routine called Counter which counts the number of values in a list that are equal and returns them.

I tried:

import collections
counter = collections.Counter(a)
counts = counter.values()

But this only gives me those values in column A that are equal. Could someone tell me how to use this routine to compare those values with column C as well?


Solution

  • I'd go for something like this...:

    from itertools import groupby
    from operator import itemgetter
    
    with open('input') as fin, open('output', 'w') as fout:
        fout.write(next(fin, ''))
        rows = (line.split() for line in fin)
        for k, g in groupby(rows, itemgetter(0, 2)):
            fout.write('{} {} {}\n'.format(k[0], sum(1 for _ in g), k[1]))
    
    • Write header straight out...
    • Build a generator to produce rows split by whitespace
    • Use itertools.groupby to groupby first and third column
    • Count the number of occurrences in g to get the length of the group
    • Write out the lines formatted as desired...

    You can also use a collections.Counter if the groups are non-contiguous and should be counted as one, you can replace the groupby instead with the following:

    counts = Counter((row[0], row[2]) for row in rows)
    for k, v in counts.iteritems():
        fout.write('{} {} {}\n'.format(k[0], v, k[1]) # or even...
                # print >> fout, k[0], v, k[1]