Search code examples
pythoncountvectorizer

Vectorizing a list of lists with sklearn learn?


I am trying to use CountVectorizer from sklearn with a list of lists.

Lst=[['apple','peach','mango'],['apple','apple','mango']]

I would like the output to return the count of words in each list. For example:

0:apple:1
0:peach:1
0:mango:1

1:apple:2
1:peach:0
1:mango:1

or any other format.

I found this post that is similar to mine, but the answer wasn't complete.

How should I vectorize the following list of lists with scikit learn?

Any help is appreciated.


Solution

  • Try this, using Counter

    >>> from collections import Counter
    >>> lst=[['apple','peach','mango'],['apple','apple','mango']]
    

    Output:

    >>> {i:Counter(v) for i,v in enumerate(lst)}
    {0: Counter({'apple': 1, 'peach': 1, 'mango': 1}),
     1: Counter({'apple': 2, 'mango': 1})}
    

    To get in the expected format(in list)

    >>> [[i, obj, count] for i,v in enumerate(lst) for obj,count in Counter(v).items()]
    [[0, 'apple', 1],
     [0, 'peach', 1],
     [0, 'mango', 1],
     [1, 'apple', 2],
     [1, 'mango', 1]]