Search code examples
pythonpython-3.xpandasdataframetokenize

Delete brackets from column values


I have the following dataframe:

df = pd.DataFrame({'column1': ['Severe weather Not Severe weather kind of severe weather]})

I tokenized this dataframe:

from nltk.tokenize import word_tokenize
df['column1'] = df['column1'].apply(lambda x: word_tokenize(x))

The output is enclosed inside brackets:

column1
0   [Severe, weather, Not, Severe, weather, kind, of, severe, weather]

I want the have the output without brackets:

column1
0 Severe, weather, Not, Severe, weather, kind, of, severe, weather

What I have tried:

def delete_brackets(x):
    for i in x:
        if i == '[' or i == ']':
            x.remove(i)
    return x
df=delete_brackets(df)

and

def remove_brackets(x):
    return x.replace('[', '').replace(']', '')
df=remove_brackets(df)

Still getting the output inside brackets

Any ideas? Thanks


Solution

  • You can use

    df['column1'] = df['column1'].apply(lambda x: ", ".join(map(str, word_tokenize(x))))
    

    Output:

    >>> print(df.to_string())
                                                                column1
    0  Severe, weather, Not, Severe, weather, kind, of, severe, weather
    

    The word_tokenize() function returns a list of tokens that you need to cast to str (this is done with map(str, word_tokenize(x))) and then you can join the strings with a comma and space.