Search code examples
ruby-on-railsrubyfacebookfacebook-opengraphscraper

How to let the Facebook scraper into dynamic, authenticated pages


I have a social network that requires authentication and email verification before a user can enter. Once inside, users can only see content from their friends. Its actually really simple, even if it doesn't sound it. Here is my authenticate before filter:

  def authenticate
    if logged_in?
      redirect_to authentication_url if current_user.account_disabled
    else
      redirect_to root_url
    end
  end

The problem I have is letting the Facebook scraper in to get the meta tags from some of the dynamic pages. I read that you can allow the Facebook's User Agent into non public pages, but isn't that for pages that are protected in the robots.txt file? I'm not experienced with scrapers but surely it will need a cookie and an enabled account to scrape the dynamic information on my site? I'm not even sure how to actually write the method to let the scraper in or where to write it.

I'll though about generating a token with SecureRandom.urlsafe_base64 for the scraper and making an exception on a blank page (with the meta data) that shouldn't be accessable to regular users, but technically that wouldn't be safe, considering that if you looked at the right JS file (for the URL reference in the Open Graph action POST) and meta tags you could get protected user data. This idea doesn't seem even close to correct...

Any ideas?


Solution

  • As long as your content has unique URLs for what each user sees (normally protected by a login filter), you can allow access by checking the source IP or user agent to match the Facebook scraper.

    However, like most social sites, you are likely using the same URLs to return customized contents rendered for the currently logged in user. This is inherently unscrapable - because there is a different version of say '/profile' for each user.