Search code examples
pythonpandasdictionarytranslate

translate panda dataframe using dictionary sorted by word length


I have imported an excel to a pandas dataframe, which I'm trying to translate and then export back to an excel.

For example purpose say this is my data set:

d = {"cool":"chill", "guy":"dude","cool guy":"bro"}```
data = [['cool guy'], ['cool'], ['guy']]
df = pd.DataFrame(data, columns = ['WORDS'])


print(df)
#    WORDS   
# 0  cool guy   
# 1  cool  
# 2  guy    

So the easiest solution would be to use pandas built in function replace. However if you use:

df['WORDS'] = df['WORDS'].replace(d, regex=True) 

The result is:

print(df)
#    WORDS   
# 0  chill dude   
# 1  chill  
# 2  dude 

(cool guy doesn't get translated correctly)

This could be solved by sorting the dictionary by the longest word first. I tried to use this function:

import re
def replace_words(col, dictionary):
    # sort keys by length, in reverse order
    for item in sorted(dictionary.keys(), key = len, reverse = True):
        col = re.sub(item, dictionary[item], col)
    return col

But..

df['WORDS'] = replace_words(df['WORDS'], d) 

Results in a type error: TypeError: expected string or bytes-like object

Trying to convert the row to a string did not help either

...*
col = re.sub(item, dictionary[item], [str(row) for row in col])

Does anyone have any solution or different approach I could try?


Solution

  • Let us try replace

    df.WORDS.replace(d)
    Out[307]: 
    0      bro
    1    chill
    2     dude
    Name: WORDS, dtype: object