I'm doing dialect text classification. The problem is some tweets, can be classified as both dialect A and B, how can I do that? I want to do it and then automatically calculate the accuracy, I don't want to do it manually. When I don't classify them as both A and B, it gives me many misclassified texts.
In the training though, they're not classified as both dialect A and B. but separately.
Make use of OneHotEncoding
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# Your target will look similar to
target = ['A', 'A', 'B']
# After OneHotEncoding
[[1, 0],
[1, 0],
[0, 1]]
After training on this target, your model will predict the probability of the class. You can set a threshhold to classify the prediction to both the classes
# Sample output
[[1., 0.],
[0.5, 0.5],
[0.1, 0.9]]
predictions = ['A', 'A and B', 'B']