I have a word2vec dataframe like this which saved from save_word2vec_format using Gensim under txt file. After using pandas to read this file. (Picture below). How to delete the first word and make them as an index? I want to have a dataframe like one hot encoding vector dataframe. This is my txt file https://drive.google.com/file/d/1O206N93hPSmvMjwc0W5ATyqQMdMwhRlF/view?usp=sharing
I think need read_csv
with omit first row, change separator to \s+
for one or more whitespaces, set first column to index and set default columns names to RangeIndex
, last transpose by T
:
df = pd.read_csv('model.txt', sep='\s+', index_col=0, header=None, skiprows=1).T
print (df.head())
0 the and of a to in he \
1 -0.058613 0.015442 -0.158179 0.140175 0.093452 0.018156 0.119811
2 -0.167606 -0.107773 -0.029066 -0.206769 -0.091758 -0.089092 -0.154339
3 0.050763 -0.017081 -0.124401 0.155085 0.175548 -0.029413 0.246189
4 0.283456 0.208988 0.110836 -0.007077 0.265104 0.023497 -0.027724
5 0.152869 -0.006580 -0.009774 0.116188 0.039773 0.047682 0.008068
0 wa it i ... ammy mim candyman \
1 0.044857 0.351965 0.480889 ... 0.036848 0.060897 0.072883
2 -0.113168 -0.195455 -0.007680 ... -0.008903 -0.024123 -0.023799
3 0.039933 0.143591 0.205823 ... 0.002832 0.014112 0.011426
4 -0.074092 0.075550 -0.089214 ... 0.003451 0.012912 0.016158
5 -0.107139 0.040009 -0.013390 ... -0.000931 -0.006203 0.000539
0 washboiler mincepie ruben croome mamlet postnotes bettina
1 0.040233 0.048775 0.059252 0.029014 0.047536 0.034878 0.043068
2 -0.013842 -0.015706 -0.023821 -0.014749 -0.013498 -0.011608 -0.019654
3 0.006556 0.012816 0.004323 -0.006120 0.006841 0.008062 0.006986
4 0.011206 0.010511 0.012700 0.006781 0.007779 0.008678 0.016355
5 -0.003435 -0.003693 -0.003387 -0.002963 -0.003910 -0.001301 -0.003683
[5 rows x 31849 columns]