Search code examples
pythoncsvdictionaryreplaceabbreviation

Replace with abbreviations from dictionary using Python


I'm trying to replace words like 'rna' with 'ribonucleic acid' from a dictionary of abbreviations. I tried writing the following, but it doesn't replace the abbreviations.

import csv,re
outfile = open ("Dict.txt", "w")
with open('Dictionary.csv', mode='r') as infile:
    reader = csv.reader(infile)
    mydict = {rows[0]:rows[1] for rows in reader}
    print >> outfile, mydict
out = open ("out.txt", "w")
ss = open ("trial.csv", "r").readlines()
s = str(ss)
def process(s):
    da = ''.join( mydict.get( word, word ) for word in re.split( '(\W+)', s ) )
    print >> out, da
process(s)

A sample trial.csv file would be

A,B,C,D
RNA,lung cancer,15,biotin
RNA,lung cancer,15,biotin
RNA,breast cancer,15,biotin
RNA,breast cancer,15,biotin
RNA,lung cancer,15,biotin

Sample Dictionary.csv:

rna,ribonucleic acid
rnd,radical neck dissection
rni,recommended nutrient intake
rnp,ribonucleoprotein

My output file should have 'RNA' replaced by 'ribonucleic acid'


Solution

  • I think this line s = str(ss) is causing the problem - the list that was created just became a string!

    Try this instead:

    def process(ss):
        for line in ss:
            da = ''.join( mydict.get( word, word ) for word in re.split( '(\W+)', line ) )
            print >> out, da
    
    process(ss)