I have a task where I need to classify a few Million products. A came along mahout today and started to read some sheets.
As for now I'm a bit confused about the term of a classifier in mahout. I thought with a classifier you could classify a document to any category it would match.
Then, after reading a few sheets I feel more like it is to say if a document is a or !a
and not to check if a document is a or b or c or d ..
.
What I'm looking for is a solution to check multiple possibilities like a or b or c or d ..
. Am I on a wooden path with mahout or is mahout also build for those kind of tasks? I would like to use a supervised learning algorithm for this part and I don't really know if mahout is the framework to go for, so I'm a bit confused for now.
Any pointers?
I think you could probably make mahout work for your problem. I haven't done it myself, so can't give you specifics, but here's two approaches:
1) train a binary classifier on each of the N possibilities: a or !a, b or !b, c or !c, d or !d..., then pick the highest probability from the N results to get the assignment. Typicially classifiers output probabilities instead of True/False
2) check this out for multi-label classification using mahout: https://medium.com/p/4ea08a4662ab