Search code examples
pythoncsvcolorstext-extractionfrench

Extracting colors in French


I have a CSV with some information about products color. Since sometimes some extra details are there, I would like to extract just the color name. I found out some library but my data are in french so they dont fit those. I try to do it with Python.

From "transparent blue" I want to just keep "blue"

The table is like :

Product ref Color Sales quantity
F33 Bleu transparent 2
K367 Ecaille Marron 1

And I am looking to take the "Bleu" (Blue) and "Marron" (brown) to see which colors are the more sale


Solution

  • You could create a translator function and then apply this to the column.

    here is an example (using the data in the question).

    import pandas as pd
    
    # original dataframe
    data = {'Product ref': ['F33', 'K367'],
            'Color': ['Bleu transparent', 'Ecaille Marron'],
            'Sales quantity': [2, 1]}
    
    df = pd.DataFrame(data)
    
    
    def translate(french):
        ''' translating function '''
        if 'Bleu' in french:
            return 'blue'
        
        if 'Marron' in french:
            return 'brown'
        
        return '-'
    
    # apply the result
    df['english'] = df['Color'].apply(translate)
    print(df)
    

    This is the result:

      Product ref             Color  Sales quantity english
    0         F33  Bleu transparent               2    blue
    1        K367    Ecaille Marron               1  brown
    

    Note: You could use a much more sophistocated translating and matching function (for example googletrans). The example above is a working example.