Flickr API returns duplicate photos while extracting all geotagged photos

I'm trying to extract all the geotagged photos from Flickr using the Flickr API method flickr.photos.search(). Here is the code:

import flickr_api
import urllib2
from flickr_api.api import flickr

flickr_api.set_keys(api_key = 'my_api_key', api_secret = 'my_api_secret')
flickr_api.set_auth_handler("AuthToken")

for i in range(1, 1700):
    photo_list = flickr.photos.search(api_key='my_api_key', has_geo=1, extras='description,license,geo,tags,machine_tags', per_page=250, page=i, min_upload_date='972518400', accuracy=12)
    f = open('xmldata1/photodata' + str(i) + '.xml','w')
    f.write(photo_list)
    f.close()

This script runs to give me an xml file for each page of the data. Each xml file has 250 photos data. There are 1699 such xml files. I get approximately 420,000 photos data with a lot of duplicates. After removing the duplicates, I got only 9022 unique images.

I have read here that it is safe to query for 16 pages = 4000 images at once to avoid duplicates.

I want to avoid duplicate images as much as possible and I require 100,000+ unique geotagged images for gps clustering purpose.

What time lag should I insert between two instances of the query? If I must consider another approach, please elaborate on it.

Let me know if you have any queries. Any help would be appreciated!

Solution

Try using a max_upload_date along with the min_upload_date. Keep a time frame of a couple of days and keep shifting the time frame from the min_upload_date to the max_upload_date. Search for photos within that time frame only.