Search code examples
pythondictionaryappenddefaultdict

Iterating over files and adding values to python dictionary


I have a set of 50 text files, all set up with a first header row, the first column being gene names, and the remaining columns being values for each gene. I also have an official gene list text file. I want to use the official gene name list to build a dictionary, then iterate over the files, determine if the gene name for each line matches the gene name in the dictionary, and if it does, append the dictionary value with the additional values from the experimental file.

So the experimental file looks like this:

GENE    Exp1    Exp2
geneA   12      34
geneB   42      10
geneC   42      10

The official gene list looks like this:

GENE    
geneA   
geneC

I've tried using defaultdict and the following code (for just one experimental file, but could later iterate over more):

combo = {}

with open('official_gene_list.txt', 'r') as f:
    f.readline()
    for line in f:
        name = line.split('\n')[0]
        combo[name]={}

with open('expeirmenta1_file.txt', 'r') as g:
for each in g:
    name2 = each.split('\t')[0]
    data = each.rstrip('\n').split('\t')[1:]
    for name2 in combo:
        combo[name2].append(data)

But whenever I do that, the dictionary is made fine, but I get the following error:

AttributeError: 'dict' object has no attribute 'append'

I've also tried using a defaultdict():

from collections import defaultdict
combo = defaultdict(list)
with open('gene_orf_updated2.txt', 'r') as f:
    f.readline()
    for line in f:
        name = line.split('\n')[0]
        combo[name]={}
with open('GSE139_meanCenter_results.txt', 'r') as g:
    for each in g:
        name2 = each.split('\t')[0]
        data = each.rstrip('\n').split('\t')[1:]
        for name2 in combo:
            combo[name2].append(data)

And I get the same error about 'dict' object has no attribute 'append'.

I've made dictionaries before, but never tried to append new values to existing keys like this. Is this possible? Any help or advice would be greatly appreciated.


Solution

  • You are close do like this.

    combo = {}
    
    with open('gene_orf_updated2.txt', 'r') as f:
        for line in f:
            name = line.split('\n')[0]
            combo[name]= []
    with open('GSE139_meanCenter_results.txt', 'r') as g:
        for each in g:
            name2 = each.split('\t')[0]
            data = each.rstrip('\n').split('\t')[1:]
            if name2 in combo:
                combo[name2].append(data)
    

    If you want to remove the nested list do this instead.

    combo[name2] += data