Search code examples
pythonpandasdataframetextnested-lists

Each row in DataFrame column is a list. How to remove leading whitespace from second to end entries


I have a dataset that has a "tags" column in which each row is a list of tags. For example, the first entry looks something like this

df['tags'][0]

result = "[' Leisure Trip ', ' Couple ', ' Duplex Double Room ', ' Stayed 6 nights ']"

I have been able to remove the trailing whitespace from all elements and only the leading whitespace from the first element (so I get something like the below).

['Leisure trip', ' Couple', ' Duplex Double Room', ' Stayed 6 nights']

Does anyone know how to remove the leading whitespace from all but the first element is these lists? They are not of uniform length or anything. Below is the code I have used to get the final result above:

clean_tags_list = []
for item in reviews['Tags']:
    string = item.replace("[", "")
    string2 = string.replace("'", "")
    string3 = string2.replace("]", "")
    string4 = string3.replace(",", "")
    string5 = string4.strip()
    string6 = string5.lstrip()
    #clean_tags_list.append(string4.split(" "))
    clean_tags_list.append(string6.split("  "))
clean_tags_list[0]


['Leisure trip', ' Couple', ' Duplex Double Room', ' Stayed 6 nights']

Solution

  • IIUC you want to apply strip for the first element and right strip for the other ones. Then, first convert your 'string list' to an actual list with ast.literal_eval and apply strip and rstrip:

    from ast import literal_eval
    df.tags.agg(literal_eval).apply(lambda x: [item.strip() if x.index(item) == 0 else item.rstrip() for item in x])