I have a text classification task in several languages. What aproach should use if I would like to create a feature that extract age from text if this are the possible classes: 18-24
,25-34
,35-49
and 50-xx"
and I have only tweets as a corpus. I all ready tried using all the tweets but with very low performance(0.66) any idea of how to aproach this task?. Thanks in advance.
Since it is still a research task, I suggest several links to scientific papers (links and the following summary are mostly taken from 'related work' section of our paper - unfortunately, in Russian, so I edited Google translation a little).
So, take a look at these works (marked by year): 2009, 2010, 2011, 2013, 2014.
In summary: you should find or create tagged corpora and use supervised machine learning with the following features: