I'm trying to use a modified version of count vectorizer where I use it to fit on a series. Then I get the sum of all the counts for values in cells. E.g: This is my series on which I'm fitting the count vectorizer.
["dog cat mouse", " cat mouse", "mouse mouse cat"]
The end result should look something like:
[1+3+4, 3+4, 4+4+3]
I've tried using Counter
but it doesn't really work in this case.
So far I've only been successful in getting a sparse matrix but that prints out the total number of elements in the cell. However I want to map the count to the entire series.
The items of the counter list can only be stored in the form of string, later a string can be evaluated using eval()
Code:
lst = ["dog cat mouse", " cat mouse", "mouse mouse cat"]
res = {}
res2 = []
for i in lst:
for j in i.split(' '):
if j not in res.keys():
res[j] = 1
else:
res[j] += 1
for i in lst:
res2.append('+'.join([str(res[j]) for j in i.split(' ')]))
print(res2)
The result (res2
) should be like ['1+3+4', '3+4', '4+4+3']
I think this is what you want...