I have a tsv file split into columns, from which I need to select specific columns and write them to a new file (basically filtering the original file). The columns are selected according to headings contained in a seperate list. I've managed to find the relevant columns' indices but for some reason i can't get them to write correctly to the new file.
with open ("some_file.txt", "w") as out_file, open("another_file.txt", "r") as in_file:
first_line = True
for line in in_file:
line = line.rstrip("\n")
line = line.split("\t")
if first_line:
column_indices = [x for x in range(len(line)) if line[x] in [some_list]
first_line = False
If I manually insert an index (out_file.write(line[7] + "\n") the correct column is printed, but no type of loop/list comp that i've tried have worked for all indices. The only way i've managed to write all the relevant contents are in lines following the headers, instead of columns under each heading.
I'm quite a beginner at python and so any help/ insight is appreciated!
Python is packaged with the csv module, which contains DictReader and DictWriter classes designed for your use case. No need to re-invent the wheel:
input.tsv:
col1 col2 col3 col4 col5
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
Python:
import csv
with open('input.tsv','r',newline='') as fin,open('output.tsv','w',newline='') as fout:
reader = csv.DictReader(fin,delimiter='\t')
writer = csv.DictWriter(fout,delimiter='\t',fieldnames=['col2','col3','col4'],extrasaction='ignore')
writer.writeheader()
for row in reader:
writer.writerow(row)
output.tsv:
col2 col3 col4
2 3 4
3 4 5
4 5 6
5 6 7