This is a function to compute word-similarity I wrote with import of xlwings, a python-excel library. I want it to return like this (what I expect, is the items in each row/line should be split by a tab. And then I could easily copy/paste into a Excel file for a sum), for example:
0.9999998807907104 'casual' 1.0 1.0 29.0
0.8386740684509277 'active' 0.3333 1.0 13.0
0.776314377784729 'cardigans'0.1667 1.0 84.0
But it actually return like this (what I hate, is I couldn't copy to Excel file for further use, like summing digits):
[[0.9999998807907104, ('casual', (1.0, 1.0, 29.0))],
[0.8386740684509277, ('active', (0.3333, 1.0, 13.0))],
[0.776314377784729, ('cardigans', (0.1667, 1.0, 84.0))]]
How could I realize that? Thank you.
def similarity(phrase, N=10):
phrase_vec = phrase_model[phrase]
CosDisList = []
wb = xw.Book('file01.xlsx')
sht = wb.sheets['sheet1']
for a_word in phrase_model.keys():
a_val = phrase_model[a_word]
cos_dis = cosine_similarity(phrase_vec, a_val)
for i in range(1, 18):
if a_word == sht.cells(i, 1).value:
DataFromExcel = (sht.cells(i, 2).value, sht.cells(i, 3).value, sht.cells(i, 4).value)
DataCombined = (a_word, DataFromExcel)
CosDisBind = [float(str(cos_dis.tolist()).strip('[[]]')), DataCombined]
CosDisList.append(CosDisBind)
CosDisListSort = sorted(CosDisList, key=operator.itemgetter(0), reverse=True)
CosDisListTopN = heapq.nlargest(N, CosDisListSort)
return CosDisListTopN
You can use the following function. Source : a blogpost
def flatten(l, ltypes=(list, tuple)):
ltype = type(l)
l = list(l)
i = 0
while i < len(l):
while isinstance(l[i], ltypes):
if not l[i]:
l.pop(i)
i -= 1
break
else:
l[i:i + 1] = l[i]
i += 1
return ltype(l)
Then just use:
abc = [[0.9999998807907104, ('casual', (1.0, 1.0, 29.0))],
[0.8386740684509277, ('active', (0.3333, 1.0, 13.0))],
[0.776314377784729, ('cardigans', (0.1667, 1.0, 84.0))]]
flat_list = flatten(abc)
final_array = np.array(flat_list).reshape((np.round(len(flat_list)//5), 5)).tolist()
# [['0.9999998807907104', 'casual', '1.0', '1.0', '29.0'], ['0.8386740684509277', 'active', '0.3333', '1.0', '13.0'], ['0.776314377784729', 'cardigans', '0.1667', '1.0', '84.0']]
Now you can join individual lists:
most_final = ["\t".join(x) for x in final_array]
print(most_final[0])
output
print(most_final[0])
0.9999998807907104 casual 1.0 1.0 29.0