Search code examples
rmachine-learningclassificationdocument-classificationtext-classification

Hierarchical prediction using R


I'm pretty new in R, and I couldn't find any information about a package who can do the following: supposing that I have a set of data (for instance, different text documents), which can have several classes.

For example, a datum could be a Sport, a Sport with Ball, a Sport without Ball and a Car. I'd like to be able to predict to which category the data belongs, considering that I may not hit that the datum is a Sport with Ball, but I'd be happy if I correctly predict that it's a Sport.

Which package can provide this kind of stuff? Some examples would be useful, if possible.

Thanks in advance


Solution

  • I am not aware of any specific packages in R that can do hierarchical classification. So there are two options:

    • Use the C API SVMstruct http://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html. Programming this in R from scratch will be quite some work.
    • Build your own hierarchical classifier system. In the top-down case you will have a multi-class classifier for each level, e.g. rec vs sci and motorcycles vs sport etc. You will use the top classifier and use its prediction to choose next classifier. The data you feed in to train a classifier with a node is the union of all data in the subtree rooted at that node.

    For details read e.g. http://jmlr.org/papers/v6/tsochantaridis05a.html