Search code examples
javatwittertwitter4jtwitter-streaming-api

Is it normal that sampling tweets using TwitterStream as in Twitter4J code example, I get just mainly question marks as user name and status?


I used the code as in the section "code example" in Twitter4j:

public static void main(String[] args) throws TwitterException, IOException{
    StatusListener listener = new StatusListener(){
        public void onStatus(Status status) {
            System.out.println(status.getUser().getName() + " : " + status.getText());
        }
        public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {}
        public void onTrackLimitationNotice(int numberOfLimitedStatuses) {}
        public void onException(Exception ex) {
            ex.printStackTrace();
        }
    };
    TwitterStream twitterStream = new TwitterStreamFactory().getInstance();
    twitterStream.addListener(listener);
    // sample() method internally creates a thread which manipulates TwitterStream and calls these adequate listener methods continuously.
    twitterStream.sample();
}

As you can see, there's a println in the code above, inside the method "onStatus". The following photo shows what I get mainly from that code. Is it normal?

question marks...question marks everywhere

Indeed, i I filter just statuses whose user hasn't got a question mark in his user name, I got almost nothing. Moreover, I should also filter users whose location is public. With regards to that I also ask what is the difference between:

user.isGeoEnabled()

and

user.getLocation() != ""

Solution

  • The responses you will get back are UTF-8 encoded https://dev.twitter.com/tags/utf-8

    If you look at some of the accounts in the output they include non-western european characters https://twitter.com/tomokichi_koyo. These are breaking the output.

    Try writing to a file instead and opening with a UTF-8 aware editor. There are various answers about setting up java and your OS to default to UTF-8 but you will need to look for you specific combination https://stackoverflow.com/search?q=windows+console+java+utf-8