I'm trying to replace words like 'rna' with 'ribonucleic acid' from a dictionary of abbreviations. I tried writing the following, but it doesn't replace the abbreviations.
import csv,re
outfile = open ("Dict.txt", "w")
with open('Dictionary.csv', mode='r') as infile:
reader = csv.reader(infile)
mydict = {rows[0]:rows[1] for rows in reader}
print >> outfile, mydict
out = open ("out.txt", "w")
ss = open ("trial.csv", "r").readlines()
s = str(ss)
def process(s):
da = ''.join( mydict.get( word, word ) for word in re.split( '(\W+)', s ) )
print >> out, da
process(s)
A sample trial.csv file would be
A,B,C,D
RNA,lung cancer,15,biotin
RNA,lung cancer,15,biotin
RNA,breast cancer,15,biotin
RNA,breast cancer,15,biotin
RNA,lung cancer,15,biotin
Sample Dictionary.csv:
rna,ribonucleic acid
rnd,radical neck dissection
rni,recommended nutrient intake
rnp,ribonucleoprotein
My output file should have 'RNA' replaced by 'ribonucleic acid'
I think this line s = str(ss)
is causing the problem - the list that was created just became a string!
Try this instead:
def process(ss):
for line in ss:
da = ''.join( mydict.get( word, word ) for word in re.split( '(\W+)', line ) )
print >> out, da
process(ss)