Search code examples
pythondjangolxmldjango-socialauth

Retrieve all contacts from gmail using python


I am using django social auth in order to retrieve contacts from gmail. I do not have any problem getting the authorization. I do a request and then I use lxml to retrieve the email addresses.

The problem is that it does not display every contacts. For example, I can retrieve only 30 contacts while I have more than 300 contacts with my gmail account.

Here is my view :

def get_email_google(request):
    social = request.user.social_auth.get(provider='google-oauth2')
    url = 'https://www.google.com/m8/feeds/contacts/default/full' + '?access_token=' + social.tokens['access_token']
    req = urllib2.Request(url, headers={'User-Agent' : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Ubuntu/11.04 Chromium/12.0.742.112 Chrome/12.0.742.112 Safari/534.30"})
    contacts = urllib2.urlopen(req).read()  
    contacts_xml = etree.fromstring(contacts)

    contacts_list = []

    for entry in contacts_xml.findall('{http://www.w3.org/2005/Atom}entry'):
        for address in entry.findall('{http://schemas.google.com/g/2005}email'):
            email = address.attrib.get('address')
            contacts_list.append(email)

I can't figure out why I do not have every contact with that url.

Any idea on how I can get every contacts ?

Thank you very much for your help !


Solution

  • As the Contacts API documentation says:

    The Contacts API has a hard limit to the number of results it can return at a time even if you explicitly request all possible results. If the requested feed has more fields than can be returned in a single response, the API truncates the feed and adds a "Next" link that allows you to request the rest of the response.

    So you'll have to page through the contacts, following those "Next" links, until you have all the contacts (which you can detect by looking for a result without a 'Next' link).

    If you don't want to do extra parsing, you could try using the start-index parameter to ask for extra contacts (ie. your program has retrieved 30, so you'll set start-index to 31 for the next query). That section also suggests you might be able to override the limit on returned results:

    If you want to receive all of the contacts, rather than only the default maximum, you can specify a very large number for max-results.

    But I wouldn't be surprised if that was false, and you'll have to use the paginated approach.