Search code examples
pythondictionaryfor-loopsequencefasta

Matching one dictionary key to another dictionary's value and returning a file with dict1.key and dict2.value


Goal is to match sequence data from fasta information and get the name and the sequences into individual files.

I have two dictionaries that I want to match between dict_1's key and dict_2's values. Then i want to make a file output that is named dict_2's key and inside the file has the corresponding dict_1 items, separated.

Heres the dictionaries:

dict_1 = {'NODE_116_length_11385_cov_7.599029_12': 'DMVDMVDMVDMVDMVDMVDMVVYMMNMETYMMDIIK*',
 'NODE_102_length_12880_cov_14.047719_19': 'EIKEEIKEEIKEEIKEEIKEEIELVILNEVKIYLNGKTTILKSKEYLKRMNERTNGNKEELLERLSKLIKIDL*',
 'NODE_105_length_12431_cov_10.730204_16': 'FFDYDKDGDLDMILINQSAPEYAKGQIQNLE*',
 'NODE_92_length_13700_cov_7.926786_1': 'GKLLLDNLENLKNVDLILMDLHMPIMDGYDCTKKIRKLGYKMPIIASTANAMSGEKEKCLNIGMNDFLLKPVQLKTFKDIIHKWLI*',
 'NODE_111_length_11631_cov_12.297685_1': 'GYSQDEQQMANDELKSASKHTEQKILSTIEEVKDDKETKKDIEYELKSTSAIGQHDSLFE*',
 'NODE_85_length_14730_cov_9.298399_1': 'HNIHHNIHHNNNNNNNIHHIDHRVFFYTQLQIFFIFHYFIMVHNTQIILIRHAEKKKGTHLSLEGIIRSNELVNFFINQYNPNINIPDIIIAMKQHKKSSNRAFETIQPLANTLNINIIHDFYKNDIKQLHDFIQLHLDKNILICWEHKVLIDITNTITHLKKLFWKKKQYEPIWIINSFNKTFQIFNQFKIINQTIDYSNFKINPIKTLHYN*',
'NODE_56_length_20640_cov_12.217877_21': 'MEEIVNYSKQYGKNQKTEAFEYADNHNLQCFQRDLNESGAKILIVDSYQNIFDSIKNSLNSNYYEYWSSTQPIKFYIDYDNKVENVDQNDLKKRAKGDIISTHKTDILNIINTVRTLIPNITGVNILKSIPDITKKSYHLIFDGIHFANRGILKKFIEDHLKPKFKDLFEKKIIDIKVYGDLCFRTLLSTKSGQNRPLYLLQTDSFLLELQENAISKENTTIEHFLKVSISHIDKDSTLFTYKSEKKKNNSKKVHLMNEDDIYSDKEIVKKYLDLLDGDRYTDYNKWLNIGFILFSINTEYIDLWHYFSNKWEHYDEENCNSKWNTFASSEYVHTINNLIHLAKIDNPDDYEELSKEVPNHDIKYLRPFDNVLSKLIYRIYGEKFVCSNPLKDEWYYFNSIRWKKENKSFNLRHKITNEVFTKIENYRRILIKEGASEEIIKNYHNILQKLGSGIKLNCLEIEFYNEKFYTIIDQNKDLIGFENGIFDLKIMEFRNGVSSDYVSLSTQYDYVYYSPEEPIYKEVSLLISQIIPNPETRHFTMKSLASCLDGHNRDENFYIWSGKNATGGNGKSTITELLSKALGEYAIDSPVSLITGKRESANSANSALASIRNKRVVIMQEPGANEQIQSDVMKSLTGGDKVSTRELNSSQIEFKPHAKIFMACNQIPILSTNDGGTSRRIKIIEFESRFVETPTEGTPVKEFKIDRELKNKLEKYKPVFMSILLDYYKIYIEEKLIPPNSVLKVTKKYESSNNNVKMFIDENIIKGTKTDFIIKEELKVLYRSDISLTRSFPRFSIFVTQFESIFGTEFVFDAKKRLYKFYGYHLKRPGDNSDDENTNNLDNSEDEF*'
 'NODE_93_length_13622_cov_12.830766_11': 'IYLNDTTTSGTNGSLIHQNIFRVNQATQNTPVYDSITQTLGNATFTIGMFYKNLSTVKANLNISNAAIRLYRIQ*',
 'NODE_124_length_10814_cov_8.548657_12': 'LDANFLDANVLDADFLDANFLDANVLDADFLDADFLERDIVVIFINCK*'}

and

dict_2 = {'MGs12_5k_2_A32': ['NODE_70_length_20145_cov_24.475261_14'],
 'MGs12_5k_2_D5': ['NODE_2_length_52708_cov_24.298236_22'],
 'MGs12_5k_2_PolB': ['NODE_32_length_24566_cov_24.203541_4'],
 'MGs12_5k_2_RNAPL': ['NODE_3_length_51209_cov_24.258005_34',
  'NODE_3_length_51209_cov_24.258005_30',
  'NODE_3_length_51209_cov_24.258005_32'],
 'MGs12_5k_2_RNAPS': ['NODE_50_length_21518_cov_25.799376_1',
  'NODE_2_length_52708_cov_24.298236_1'],
 'MGs12_5k_2_RNR': ['NODE_7_length_40427_cov_25.036238_31'],
 'MGs12_5k_2_SFII': ['NODE_7_length_40427_cov_25.036238_8'],
 'MGs12_5k_2_VLTF3': ['NODE_2_length_52708_cov_24.298236_25'],
 'MGs12_5k_2_mRNAc': ['NODE_7_length_40427_cov_25.036238_11',
  'NODE_50_length_21518_cov_25.799376_17'],
 'MGs27_5k_1_A32': ['NODE_116_length_11385_cov_7.599029_5',
  'NODE_103_length_12754_cov_11.677455_12'],
 'MGs27_5k_1_D5': ['NODE_56_length_20640_cov_12.217877_21',
  'NODE_85_length_14730_cov_9.298399_8',
  'NODE_86_length_14611_cov_12.522121_7'],
 'MGs27_5k_1_PolB': ['NODE_124_length_10814_cov_8.548657_2',
  'NODE_65_length_19237_cov_10.992128_2'],
 'MGs27_5k_1_SFII': ['NODE_93_length_13622_cov_12.830766_8'],
 'MGs27_5k_1_VLTF3': ['NODE_65_length_19237_cov_10.992128_15'],
 'MGs27_5k_1_mRNAc': ['NODE_141_length_10084_cov_14.000897_1'],
 'MGs27_5k_1_mcp': ['NODE_86_length_14611_cov_12.522121_2',
  'NODE_113_length_11459_cov_7.893722_14']} 

i tried the following based on these answers >https://stackoverflow.com/questions/53239262/nested-dictionary-from-dict1-and-dict2-using-keys-from-dict1-and-values-from-dic>

https://stackoverflow.com/questions/1317410/finding-matching-keys-in-two-large-dictionaries-and-doing-it-fast> https://stackoverflow.com/questions/32815640/how-to-get-the-difference-between-two-dictionaries-in-python>

for k, v in dict_2.items():
    print(k, v)
    for v in dict_1.keys():
        print(dict_1.values())

I cant get passed confirming the matching and printing the new dict_2.key and dict_1.values. .. In the end I would like filenames names with dict_2 keys in this way:

MGs27_5k_1_D5.txt

>NODE_56_length_20640_cov_12.217877_21 
MEEIVNYSKQYGKNQKTEAFEYADNHNLQCFQRDLNESG
AKILIVDSYQNIFDSIKNSLNSNYYEYWSSTQPIKFYID
YDNKVENVDQNDLKKRAKGDIISTHKTDILNIINTVRT...
>NODE_85_length_14730_cov_9.298399_8
MEDFTIAKQYGKNQKVEAFEYAENHNIQCFQKDLNESGAKILIADSY
LNIFNLIKNGMNANYYEYWSSTQQVKFYIDYDNKVENIDFNDLKKRS
KNIDVVSTHKTDLL...
>NODE_86_length_14611_cov_12.522121_10
MKEKFIWEFLDEEWSDLLLS...

(It should be the whole sequence, I used the ... to save space. )

This is the final answer : Thanks to the accepted comment::

def fileWrite(fileName, nodeName, fileContents):
    print(f'writing >{nodeName} {fileContents} into {fileName + ".txt"}')
    file=open( fileName + ".txt",'w+')
    #file.seek(0)
    file.write('>'+nodeName+'\n')
    file.write(fileContents+'\n')
    #file.seek(0)
    file.close()
    
for k2,v2 in dict2.items():
    for k1 in dict1:
        if k1 in v2:
            fileWrite(k2,k1,dict1[k1])

Solution

  • Firstly have a readable values in your dictionaries when presenting to others, just looking at it was a headache , show just the skeleton structure of dictionaries.

    Secondly this is a traverse and search problem with dictionaries. just loop through all the keys/values of both dictionary and write a file with the content you need

    so here is the final code

    dict1={'A':'A1','B':'B1','C':'C1'}
    dict2={'F1':['A','G'],'F2':['D'],'F3':['E','I3']}
    
    def fileWrite(fileName,fileContents):
        print(f'writing {fileContents} into {fileName + ".txt"}')
        file=open( fileName + ".txt",'a+')
        #file.seek(0)
        file.write(fileContents+'\n')
        #file.seek(0)
        file.close()
        
    for k2,v2 in dict2.items():
        for k1 in dict1:
            if k1 in v2:
                fileWrite(k2,dict1[k1])