I have a FITS file with many columns. A part of a simplified example of the file looks like this:
A B C
100 1 90
100 2 90
100 3 90
100 4 90
211 40 70
211 41 70
211 42 70
211 43 70
211 44 70
If you notice here, the first four rows of column A and C are the same, but column B varies from 1 to 4. Then the next 5 values of column A and C are the same, but column B varies from 40 to 44.
What I would like to do is, write a program that creates a file like this:
A B C
100 4 90
211 5 70
That is, column B should contain the number of values for which column A and C were the same!
I would like to know how do to this in Python. It does not necessarily need to deal with a FITS file, I can also convert it to ASCII format if there are some routines that cannot be used in FITS files.
What I have tried so far:
I bumped into the routine called Collections
which has a sub-routine called Counter
which counts the number of values in a list that are equal and returns them.
I tried:
import collections
counter = collections.Counter(a)
counts = counter.values()
But this only gives me those values in column A that are equal. Could someone tell me how to use this routine to compare those values with column C as well?
I'd go for something like this...:
from itertools import groupby
from operator import itemgetter
with open('input') as fin, open('output', 'w') as fout:
fout.write(next(fin, ''))
rows = (line.split() for line in fin)
for k, g in groupby(rows, itemgetter(0, 2)):
fout.write('{} {} {}\n'.format(k[0], sum(1 for _ in g), k[1]))
itertools.groupby
to groupby first and third columng
to get the length of the groupYou can also use a collections.Counter
if the groups are non-contiguous and should be counted as one, you can replace the groupby
instead with the following:
counts = Counter((row[0], row[2]) for row in rows)
for k, v in counts.iteritems():
fout.write('{} {} {}\n'.format(k[0], v, k[1]) # or even...
# print >> fout, k[0], v, k[1]