Search code examples
pythonlistunicodeunicode-escapes

Get proper list from list of unicode list


I have a list with a unicode string in a form of a list.

my_list = [u'[James, Williams, Kevin, Parker, Alex, Emma, Katie\xa0, Annie]']

I want a list which I am able to iterate such as;

name_list = [James, Williams, Kevin, Parker, Alex, Emma, Katie, Annie]

I have tried several possible solutions given here, but none of them worked in my case.

# Tried
name_list =  name_list.encode('ascii', 'ignore').decode('utf-8')

#Gives unicode return type

# Tried
ast.literal_eval(name_list)

#Gives me invalid token error

Solution

  • Firstly, a list does not have a encode method, you have to apply any string methods on the item in the list.

    Secondly, if you are looking at normalizing the string, you can use the normalize function from Python's unicodedata library, read more here, this removes the unwanted characters '\xa0' and will help you normalize any other characters.

    Then instead of using eval which is generally unsafe, use a list comprehension to build a list:

    import unicodedata
    
    li = [u'[James, Williams, Kevin, Parker, Alex, Emma, Katie\xa0, Annie]']
    inner_li = unicodedata.normalize("NFKD", li[0]) #<--- notice the list selection
    
    #get only part of the string you want to convert into a list
    new_li = [i.strip() for i in inner_li[1:-1].split(',')] 
    new_li
    >> ['James', 'Williams', 'Kevin', 'Parker', 'Alex', 'Emma', 'Katie', 'Annie']
    

    In your expected output, they are actually a list of variables, which unless declared before, will give you an error.