Search code examples
pythontwittergeolocationgeocodingtweepy

Why these two APIs(twitter geo/search APIs) return different result sets?


I am fetching tweets from a specific region, but i am getting very different result sets. First method by giving longitude and latitude within a given radius. These are the longitude and latitude within city (Lahore,PK) and draw 5km radius. 5km is a very small portion of this city. By this i fetched around 60,000 tweets of a single day.

Method1

import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy.API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)
public_tweets = tweepy.Cursor(api.search, count=100, geocode="31.578871,74.305184,5km",since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
    print(tweet.full_text)

Second Method, i used twitter geo search api, by querying Lahore, granularity="city". Now i am fetching tweets of whole city. but now i am getting 1200 tweets only of one day. I also fetched from past 7 days and get only 15,000 tweets. This is a very big difference that whole city is only giving me 1200 tweets and small portion of the same city is giving me more than 60,000 tweets. I also print the place id to verify that i am getting accurate polygons. These are the polygons ( 74.4493870, 31.4512220 74.4493870, 31.6124170 74.2675860, 31.6124170 74.2675860, 31.4512220) and i draw these on https://www.keene.edu/ to verify. and yes these are accurate polygons of Lahore city.

Method2

import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy.API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)

places = api.geo_search(query="Lahore", granularity="city")

for place in places:    
    print("placeid:%s" % place)
public_tweets = tweepy.Cursor(api.search, count=100,q="place:%s" % place.id,since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
    print(tweet.full_text)

Now first tell me why there is huge difference in result. I am using standard Api version.

Second, tell me How these (api)s fetch tweets. because less then 1% tweets are geo-tagged and also not every user on there profile give exact city and country. Some users mention like Mars and Earth etc. So How these api work to fetch tweets in a specific region. either searching in a radius or by querying city/country. I studied twitter api docs and tweepy docs to study how these api work in background to collect tweets of specific region, but I do not found any helpful material.


Solution

  • The reason why first method has more results is that if the tweet does not have any geo information, then search with geocode will fall back on the profile (as you already guessed) and will try to resolve it into lat/long.

    See the documentation here:

    https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators.html

    Geolocalization: the search operator “near” isn’t available in the API, but there is a more precise way to restrict your query by a given location using the geocode parameter specified with the template “latitude,longitude,radius”, for example, “37.781157,-122.398720,1mi”. When conducting geo searches, the search API will first attempt to find Tweets which have lat/long within the queried geocode, and in case of not having success, it will attempt to find Tweets created by users whose profile location can be reverse geocoded into a lat/long within the queried geocode, meaning that is possible to receive Tweets which do not include lat/long information.

    On the other hand, search with place_id seems to be looking for that exact place. Here is the basic api call syntax: https://developer.twitter.com/en/docs/tweets/search/guides/tweets-by-place

    The place api works very differently than lat/long in geocode. Following page throws light on the differences between the two types of location data which can be associated to a tweet:

    https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location

    Tweet-specific location information falls into two general categories:

    Tweets with a specific latitude/longitude “Point” coordinate
    Tweets with a Twitter “Place” (see our blog post on Twitter Places: More Context For Your Tweets and our documentation on Twitter
    

    geo objects for more information).

    ...

    Tweets with a Twitter “Place” contain a polygon, consisting of 4 lon-lat coordinates that define the general area (the “Place”) from which the user is posting the Tweet. Additionally, the Place will have a display name, type (e.g. city, neighborhood), and country code corresponding to the country where the Place is located, among other fields.

    Also, this section: pay attention to the plural usage Place IDs

    place:

    Filter for specific Places by their name or ID. To discover “Places” associated with a specific area, use Twitter’s reverse_geocode endpoint in the REST API. Then use the Place IDs you find with the place: operator to track Tweets that include the specific Place being referenced. If you use the Place name rather than the numeric ID, ensure that you quote any names that include spaces or punctuation.