Search code examples
pythonfacebookmechanizemechanize-python

How to deal with an 'endless' webpage when scraping


I'm making a scraper to grab a list of my friends from facebook then scrape a list of mutual friends from them, with goal of constructing a web with the data. I looked at the official facebook api, and it doesn't seem possible to do so I decided to simply scrape from the webpages.

After using mechanize to login, I scraped the page and discovered that facebook only loads 20 friends at a time, loading more as you scroll. I looked through he mechanize docs, but I couldn't find a solution. I tried sleeping for a few seconds before souping the page and that didn't work either.

Not sure where to go from here, is there anyway to emulate scrolling in mechanize?


Solution

  • Unless you use Selenium to simulate the actual webpage, you won't be able to simulate "scrolling" (how do you scroll when there is no window, therefore no window height?)

    You state that there's nothing in the API which allows you to fetch friends of friends, but there seems to be an API function that allows fetching the friend list of a user.

    If that doesn't work either, your only choice would be to track down the ajax that FB uses to fetch the next list of friends, and use that to fetch more information.