I have two ascii tables in text files containing information about stars, one of which with the headers
| ID | TIME | MAGNITUDE | ERROR |
and the other has the headers
| ID | CLASS |
and I want to add the CLASS column to the first text file. The main problem here is that the first text file has got many rows for each star (I.E. Star 3_6588 has got 20 entries in table a for different times) whereas the second text file has only one entry for each ID (as Star 3_6588 is always a Class I).
What I need to do is to add the |CLASS| column to the first table where every instance of a specific ID has the required class. The text file has over 14 million rows in it which is why I can't just do this manually.
@Terry Spotts has the right idea. However the leading and trailing |
characters in the header line will make this a slightly tricky CSV, as the delimiter is a pipe char, but sometimes with a leading space, trailing space, or both. But here's an example to generate your ID: Class
dictionary:
> cat bigfile.txt
| ID | TIME | MAGNITUDE | ERROR |
| Star 3_6588 | 10 | 2 | 1.02 |
| Star 3_6588 | 15 | 4 | 1.2 |
| Star 2_999 | 20 | 6 | 1.4 |
| Star 2_999 | 25 | 8 | 1.6 |
> cat smallfile.txt
| ID | CLASS |
| Star 3_6588 | CLASS I |
Code:
id2class = {}
with open('/tmp/smallfile.txt', 'r') as classfile:
line = classfile.readline() # Skip Header Line
for line in classfile:
line = line.rstrip('\n')[2:-2] # strip newline and the Pipe-Space / Space-Pipe and the start + end
fields = line.split(' | ') # Split on ' | '
id = fields[0]
starclass = fields[1]
id2class[id] = starclass
Now you have a dict id2class
that looks like:
{
'Star 3_6588': 'CLASS I',
'Star 2_999': 'CLASS II'
}
You can then parse the first file in a similar way, use the ID of each line to look up the Class
in the dict, and write out the full data for the line to a new file. I'll leave that part to you :)
Happy Coding!