Search code examples
twittertweepytwitterapi-pythontwitter-api-v2twitter-api-v1

Geolocation of twitter users from tweets


I would like to retrieve the GPS longitude and latitude coordinates of Twitter users from the posts. I needed high granular geolocations, so want to collect tweets whose location is automatically recorded by Twitter trough the GPS and not self-reported by the user.

Before, Twitter provides this access through the tweepy.Stream class, so for example:

import tweepy

LOCATION = [-124.7771694, 24.520833, -66.947028, 49.384472]

class MyStreamListener(tweepy.Stream):
    # class  methods

stream = MyStreamListener(apikey,apikeysecret,accesstoken,accesstokensecret)
stream.filter(locations=LOCATION)

However, as stated in the tweepy documentation, new Twitter apps cannot use Stream class beyond April 29, 2022.

New Twitter Developer Apps created on or after April 29, 2022 will not be able to gain access to v1.1 statuses/sample and v1.1 statuses/filter, the Twitter API v1.1 endpoints that Stream uses. Twitter API v2 can be used instead with StreamingClient.

Unfortunately, the StreamingClient does not provide the locations parameter in its filter() or any other method of the class.

Does that mean Twitter stops providing this metadata to researchers?


Solution

  • First of all, addressing this comment from your original question...

    so want to collect tweets whose location is automatically recorded by Twitter t[h]rough the GPS and not self-reported by the user.

    Twitter does not automatically record user location at all. It is entirely down to a user to choose to add location data to a Tweet. A relatively small proportion of Tweets carry location information, and far fewer of those carry specific GPS information since the option to add information at that level was removed from the Twitter app several years ago.

    To get into the details...

    Streaming works differently in the modern version of the Twitter API.

    In v1.1 you would provide "track" and "filter" options to a single API endpoint and then listen as matches came in.

    In v2 of the API, you have a connection (via StreamingClient if you are using Tweepy) and you create rules via a separate endpoint (the StreamRule in Tweepy). These can contain a number of operators.

    value (str | None) – The rule text. If you are using a Standard Project at the Basic access level, you can use the basic set of operators, can submit up to 25 concurrent rules, and can submit rules up to 512 characters long. If you are using an Academic Research Project at the Basic access level, you can use all available operators, can submit up to 1,000 concurrent rules, and can submit rules up to 1,024 characters long.

    You'll need to refer to the Twitter documentation on creating search queries for rules. At a high level, you can use the has:geo operator to find Tweets with geo information, and potentially the place: operator to narrow down to areas. That's how you can use the current API to filter based on location. Note that available operators may vary based on your Twitter API access level.