I'm working in Python and I have a column in data frame that is a string and looks like that :
df['set']
0 [911,3040]
1 [130055, 99832, 62131]
2 [19397, 3987, 5330, 14781]
3 [76514, 70178, 70301, 76545]
4 [79185, 38367, 131155, 79433]
I would like it to be:
['911','3040'],['130055','99832','62131'],['19397','3987','5330','14781'],['76514',70178','70301','76545'],['79185','38367','131155','79433']
in order to be able to run Word2Vec:
model = gensim.models.Word2Vec(df['set'] , size=100)
Thanks !
If you have a column of strings, I'd recommend looking here at different ways of parsing it.
Here's how I'd do it, using ast.literal_eval
.
>>> import ast
>>> [list(map(str, x)) for x in df['set'].apply(ast.literal_eval)]
Or, using pd.eval
-
>>> [list(map(str, x)) for x in df['set'].apply(pd.eval)] # 100 rows or less
Or, using yaml.load
-
>>> import yaml
>>> [list(map(str, x)) for x in df['set'].apply(yaml.load)]
[
['911', '3040'],
['130055', '99832', '62131'],
['19397', '3987', '5330', '14781'],
['76514', '70178', '70301', '76545'],
['79185', '38367', '131155', '79433']
]