I have a csv file with this structure:
Name: Tags: col4 col4 col5 col6 col7
T1 G1 G2 G3 G4 G5
T2 G1 G2
T3 G1 G2 G3
T4 G1 G2 G3 G4 G5 G6
T5 G1 G2 G3 G4
The actual file has 279 columns, and all the rows are varying in length. My aim is get each name as a key, and then the corresponding tags as a list of values in a python dictionary.
My current code is this:
import csv
my_dict = {}
with open('infile.csv') as file:
reader = csv.reader(file)
for row in reader:
my_dict[row[0]] = row[1:]
print(my_dict)
This works, but the blank cells are included as values in the dictionary eg;
{T1: ['G1', 'G2', 'G3', 'G4', 'G5', ''], T2: ['G1', 'G2', '', '', '', ''] etc.
Whereas my aim is to get this:
{T1: ['G1', 'G2', 'G3', 'G4', 'G5'], T2: ['G1', 'G2'] etc.
I can't find any option for csv.reader that skips over blank cells. I have tried csv.DictReader (apparently this automatically ignores blank cells?) but it doesn't allow slices, and I can't name and specify 279 columns.
I am aware that there are similar questions on here, but none of them seem to be what I'm looking for in terms of how I want the file to be read.
I have been stuck on this for a while so any help would be much appreciated.
You could just use a list comprehension to pick non empty cells as follows:
import csv
my_dict = {}
with open('infile.csv', newline='') as f_input:
csv_input = csv.reader(f_input)
header = next(csv_input) # skip over the header row
for row in csv_input:
my_dict[row[0]] = [cell for cell in row[1:] if cell]
print(my_dict)
Giving you my_dict
containing:
{'T1': ['G1', 'G2', 'G3', 'G4', 'G5'], 'T2': ['G1', 'G2'], 'T3': ['G1', 'G2', 'G3'], 'T4': ['G1', 'G2', 'G3', 'G4', 'G5', 'G6'], 'T5': ['G1', 'G2', 'G3', 'G4']}
Note: Using Python 3.x, the file should be opened with newline=''
when used with a CSV object.