Search code examples
pythonpandasdataframereplaceseries

Pandas column replace multiple special characters and insert new characters


I have a pandas dataframe like as below

Country_list

{'INDIA': '98.31%', 'ASEAN': '1.69%'}
{'KOREA': '100.0%'}
{'INDIA': '95.00%', 'ASEAN': '2.50%','ANZ': '2.50%'}
{'INDIA': '95.00%', 'ASEAN': '2.50%','ANZ': '1.25%','KOREA': '1.25%'}

I would like to do the below

a) Replace all numbers and special characters with '' (no blanks)

b) insert new character - Comma between different region names

I tried the below but this doesn't seem efficient or elegant

df['Country_list'] = df['Country_list'].str.replace(r":",'', regex=True).str.replace(r"%", '', regex=True).str.replace(r"{",'', regex=True).str.replace(r"}",'', regex=True)

I expect my output to be like as below

INDIA,ASEAN
KOREA
INDIA,ASEAN,ANZ
INDIA,ASEAN,ANZ,KOREA

Solution

  • If need join keys of dictionaries by , convert strings to dictionaries and join keys in list comprehension:

    import ast
    
    df['Country_list'] = [','.join(ast.literal_eval(x).keys()) for x in df['Country_list']]
    print (df)
                Country_list
    0            INDIA,ASEAN
    1                  KOREA
    2        INDIA,ASEAN,ANZ
    3  INDIA,ASEAN,ANZ,KOREA
    

    Alternative solution:

    import ast
    
    f = lambda x: ','.join(ast.literal_eval(x).keys())
    df['Country_list'] = df['Country_list'].apply(f)