Search code examples
machine-learningnlpdata-sciencetext-classification

How to handle repeating text data but with Different Labels or Classes?


I am doing a Multi-class Text Classification. However, I have data that are repeating in the dataset. However, these are not duplicates, as they belong to different classes. The data is valid, these two classes are close to each other, The repeated text training data is not of the same class, but of diff classes with the same shared URLs. What can I do, so that my Text classifier effectively working at predicting the future inputs with higher probability without sharing probability with the other counterpart? Also are there any other techniques TO NOTE: Only 10 % of training data is repeated with diff classes.


Solution

  • The problem you are trying to solve is not multi class classification but multi label classification. There are different methods to solve multi label classification. A starting point can be here : https://scikit-learn.org/stable/modules/multiclass.html