Search code examples
pythoncsvfilepython-3.7

working with columns in tsv files - python 3


I have a tsv file split into columns, from which I need to select specific columns and write them to a new file (basically filtering the original file). The columns are selected according to headings contained in a seperate list. I've managed to find the relevant columns' indices but for some reason i can't get them to write correctly to the new file.

with open ("some_file.txt", "w") as out_file, open("another_file.txt", "r") as in_file:
first_line = True
for line in in_file: 
    line = line.rstrip("\n")
    line = line.split("\t")         
    if first_line:   
        column_indices = [x for x in range(len(line)) if line[x] in [some_list]
        first_line = False

If I manually insert an index (out_file.write(line[7] + "\n") the correct column is printed, but no type of loop/list comp that i've tried have worked for all indices. The only way i've managed to write all the relevant contents are in lines following the headers, instead of columns under each heading.

I'm quite a beginner at python and so any help/ insight is appreciated!


Solution

  • Python is packaged with the csv module, which contains DictReader and DictWriter classes designed for your use case. No need to re-invent the wheel:

    input.tsv:

    col1    col2    col3    col4    col5
    1   2   3   4   5
    2   3   4   5   6
    3   4   5   6   7
    4   5   6   7   8
    

    Python:

    import csv
    
    with open('input.tsv','r',newline='') as fin,open('output.tsv','w',newline='') as fout:
        reader = csv.DictReader(fin,delimiter='\t')
        writer = csv.DictWriter(fout,delimiter='\t',fieldnames=['col2','col3','col4'],extrasaction='ignore')
        writer.writeheader()
        for row in reader:
            writer.writerow(row)
    

    output.tsv:

    col2    col3    col4
    2   3   4
    3   4   5
    4   5   6
    5   6   7