i have dataframe named df. I'm using code below to get the cosine similarity for each row:
vectorizer = CountVectorizer()
features = vectorizer.fit_transform(df['name']).todense()
for f in features:
for index, row in df.iterrows():
df['index'+str(index)] = pd.DataFrame(cosine_similarity(features,f))
df
but the output DataFrame shows the same result for each records where I assume that it refers to the last record:
name index0 index1 index2 index3 index4
0 aaaabbbbbbcccc 0.158114 0.158114 0.158114 0.158114 0.158114
1 ddddffffffgggg 0.204124 0.204124 0.204124 0.204124 0.204124
2 hhhhhhiiiiiijjjjj 0.158114 0.158114 0.158114 0.158114 0.158114
3 kkkkkklllllllmmmm 0.235702 0.235702 0.235702 0.235702 0.235702
4 mmmmmnnnnnnooooooo 1.000000 1.000000 1.000000 1.000000 1.000000
I want the output for all records
IIUC you simply need:
for i, f in enumerate(features):
address['index'+str(i)] = pd.DataFrame(cosine_similarity(features,f))
address