I want to do two types of classification on a multiclass dataset (consists of labelled sentences from multiple files of scientific articles). What I want to do is similar to this https://www.cl.cam.ac.uk/~sht25/papers/aaai98.pdf . So the first is binary classification to get rid of sentences from a label named "others". What's left will be used for the second classification which is a multiclass classification.
Currently I'm stuck at 'how do I do binary classification on multiclass dataset?'. I thought about doing one vs rest (ovr) classification but from the examples I've seen, the built-in ovr will create models for all the classes and do the ovr from there. Meanwhile I only want to do ovr for one labels, which is "others" vs all the rest. Please help.
Just create a new label column that (for each row) assigns 1 if the label is "others" and assigns 0 otherwise. Then do a binary classification using that newly created label column. I hope I understood your question correctly?...