Search code examples
twittermachine-learningclassification

Classifying Twitter Text By Gender


I have a couple hundred tweets at my disposal and I am looking to classify each twitter user as a male and female by their getting their realname and looking at at least 2 of their tweets. I already have programmed getting each person's real name from their profile and I'm now looking to classify their tweet texts to try to make a stronger affirmation whether a user is a M or F. I've looked and searched online for examples of text classification but am not quite sure where to begin. I also found some VERY useful data at this link Twitter Text With Gender Download. Any suggestions on how to classify tweet text as written by a male or female would greatly be appreciated! I have sort of hit a brick wall.


Solution

  • I don't have any other text datasets that are for SURE written by males or females to aid in training the classifier.

    This is a hurdle for you then. Either you need to perform supervised learning with such a data set, for instance using a perceptron learner; or you need to perform unsupervised learning, for instance k-means clustering, and try to find clusters that you can (somewhat arbitrarily) declare to be male or female signals. Distinguishing gender in an unsupervised approach is going to be next to impossible in practice, at least without some other existing information, priors, or feature maps that you can build upon.