Search code examples
pythonpandasdataframeword2vec

string vector to list python


I'm working in Python and I have a column in data frame that is a string and looks like that :

df['set'] 

0  [911,3040]
1  [130055, 99832, 62131]
2  [19397, 3987, 5330, 14781]
3  [76514, 70178, 70301, 76545]
4  [79185, 38367, 131155, 79433]

I would like it to be:

['911','3040'],['130055','99832','62131'],['19397','3987','5330','14781'],['76514',70178','70301','76545'],['79185','38367','131155','79433']

in order to be able to run Word2Vec:

model = gensim.models.Word2Vec(df['set'] , size=100)

Thanks !


Solution

  • If you have a column of strings, I'd recommend looking here at different ways of parsing it.

    Here's how I'd do it, using ast.literal_eval.

    >>> import ast
    >>> [list(map(str, x)) for x in df['set'].apply(ast.literal_eval)]
    

    Or, using pd.eval -

    >>> [list(map(str, x)) for x in df['set'].apply(pd.eval)]  # 100 rows or less
    

    Or, using yaml.load -

    >>> import yaml
    >>> [list(map(str, x)) for x in df['set'].apply(yaml.load)]
    

    [
         ['911', '3040'], 
         ['130055', '99832', '62131'], 
         ['19397', '3987', '5330', '14781'], 
         ['76514', '70178', '70301', '76545'],
         ['79185', '38367', '131155', '79433']
     ]