I would like to retrieve or filter by language using TwitterStream
class. I want to get only tweets of one language or otherwise retrieve everything and then identify each tweet language.
I have build this code but getIsoLanguageCode()
returns null
always (see version 3.0.4 JavaDocs). I think they have problems with this method.
TwitterStream twitterStream = TwitterPrintRandomStream.createTwitterConnection();
StatusListener listener = new StatusListener() {
public void onStatus(Status status) {
String tw = status.getText() + " " + status.getIsoLanguageCode();
System.out.println(tw);
}
...
}
I also tried the method Status.getUser().getLang()
but it returns the user's language not the tweet. Is there any way to do it?
Thanks in advance.
I don't think you can rely on iso_language_code
- I couldn't find reference to it in the REST or streaming APIs.
Tweets do have a lang
attribute which indicates the language that the Tweet was written in. This was recently added to the API and, unfortunately, Twitter4J does not yet provide you with access to it.
There is a task to add it in version 3.0.4 but the work does not to appear to have started yet. Unfortunately you'll need to wait until they add it or perhaps you could give them a hand and submit a pull-request.