Search code examples
pythonflickr

Flickr API returns duplicate photos


I'm trying to extract geo-tagged photos using by python from Flickr API. But, it returns duplicate photos. when it extracting over 41 pages, returns same photo URL. Here is my code,

# !/usr/bin/python
# coding=utf-8
from flickrapi import FlickrAPI
import json, time, os
import pymongo

client = pymongo.MongoClient("localhost",27017)
db = client.flickr
coll = db.flickr_a

API_KEY = "xxx"
SEACRET_KEY = "xxx" 

flickr = FlickrAPI(API_KEY, SEACRET_KEY, format="parsed-json")
extras="url_c,url_l,url_o,geo,date_taken,owner_name"

for page in xrange(1,550):
    disney = flickr.photos.search(bbox="139.867,35.613,139.914,35.645", 
per_page=100,extras=extras,page=page)
    photos = disney["photos"]
    coll.insert(photos)

Please give me advice or sample code. Thanks.


Solution

  • A quick fix would be to store the photo urls in a python list and remove duplicates by turning it into a set.

    at the beginning

    coll = []
    

    to add

    coll.append(photos)
    

    and at the end (I'm guessing your insert command here)

    for p in set(coll):
        db.flickr_a.insert(p)