Search code examples
twittertext-classificationcorpustagged-corpus

How to extract manually annotated tweets using Twitter API?


I'm using text classification to classify dialects. First I need a large manually annotated tweets, and I have read a research paper that says:

We have collected tweets that were published during June 2015. Arabic linguists manually annotated a small part of these tweets, so we got 51,589 tweets with correct dialectal labels. These tweets were manually found in Twitter and annotated by the linguists.

So this researcher was able to extract those tweets, I wanted to contact him but their emails weren't valid. He says those tweets were published during June 2015. How can I extract those tweets?


Solution

  • I would have to assume that the researcher did that in realtime during June 2015.

    Today, the only way to do that would be to use the Full Archive Search API (a premium, paid offering from Twitter) to search for those Tweets. In terms of the annotations, those would have been part of their research; Twitter does not annotate Tweets with dialectal labels.