Search code examples
pythonscikit-learnapache-storm

Restrictions in terms of using external libraries (Python) in a Storm Bolt


I want to implement a Bolt (https://github.com/nathanmarz/storm) that does some heavy processing on a tuples using scikit Machine Learning API (http://scikit-learn.org/)

For example -

from sklearn import decomposition
from sklearn import datasets

trans_corpus = vectorizer.fit_transform(corpus)
tfidf = text.TfidfTransformer().fit_transform(trans_corpus)
...
...

Is this possible ? Is having sklearn and all it's dependencies installed on each node in the cluster enough?


Solution

  • Theoretically, it should be possible unless there is something really weird about scikit I don't know. You just need to build your topologies such that you can write your bolts in Python, which I suspect you already know is possible and there are plenty of examples.