Search code examples
pythonpython-3.xcsvdictionaryskip

How to skip over blank cells when constructing a Python dictionary from a csv file?


I have a csv file with this structure:

Name:   Tags:   col4    col4    col5    col6    col7
T1      G1      G2      G3      G4      G5  
T2      G1      G2              
T3      G1      G2      G3          
T4      G1      G2      G3      G4      G5      G6
T5      G1      G2      G3      G4      

The actual file has 279 columns, and all the rows are varying in length. My aim is get each name as a key, and then the corresponding tags as a list of values in a python dictionary.

My current code is this:

import csv

my_dict = {}
with open('infile.csv') as file:
    reader = csv.reader(file)
    for row in reader:
        my_dict[row[0]] = row[1:]
print(my_dict)

This works, but the blank cells are included as values in the dictionary eg;

{T1: ['G1', 'G2', 'G3', 'G4', 'G5', ''], T2: ['G1', 'G2', '', '', '', ''] etc.

Whereas my aim is to get this:

{T1: ['G1', 'G2', 'G3', 'G4', 'G5'], T2: ['G1', 'G2'] etc.

I can't find any option for csv.reader that skips over blank cells. I have tried csv.DictReader (apparently this automatically ignores blank cells?) but it doesn't allow slices, and I can't name and specify 279 columns.

I am aware that there are similar questions on here, but none of them seem to be what I'm looking for in terms of how I want the file to be read.

I have been stuck on this for a while so any help would be much appreciated.


Solution

  • You could just use a list comprehension to pick non empty cells as follows:

    import csv
    
    my_dict = {}
    
    with open('infile.csv', newline='') as f_input:
        csv_input = csv.reader(f_input)
        header = next(csv_input)   # skip over the header row
    
        for row in csv_input:
            my_dict[row[0]] = [cell for cell in row[1:] if cell]
    
    print(my_dict)        
    

    Giving you my_dict containing:

    {'T1': ['G1', 'G2', 'G3', 'G4', 'G5'], 'T2': ['G1', 'G2'], 'T3': ['G1', 'G2', 'G3'], 'T4': ['G1', 'G2', 'G3', 'G4', 'G5', 'G6'], 'T5': ['G1', 'G2', 'G3', 'G4']}
    

    Note: Using Python 3.x, the file should be opened with newline='' when used with a CSV object.