I have a social network that requires authentication and email verification before a user can enter. Once inside, users can only see content from their friends. Its actually really simple, even if it doesn't sound it. Here is my authenticate before filter:
def authenticate
if logged_in?
redirect_to authentication_url if current_user.account_disabled
else
redirect_to root_url
end
end
The problem I have is letting the Facebook scraper in to get the meta tags from some of the dynamic pages. I read that you can allow the Facebook's User Agent into non public pages, but isn't that for pages that are protected in the robots.txt
file? I'm not experienced with scrapers but surely it will need a cookie and an enabled account to scrape the dynamic information on my site? I'm not even sure how to actually write the method to let the scraper in or where to write it.
I'll though about generating a token with SecureRandom.urlsafe_base64
for the scraper and making an exception on a blank page (with the meta data) that shouldn't be accessable to regular users, but technically that wouldn't be safe, considering that if you looked at the right JS file (for the URL reference in the Open Graph action POST) and meta tags you could get protected user data. This idea doesn't seem even close to correct...
Any ideas?
As long as your content has unique URLs for what each user sees (normally protected by a login filter), you can allow access by checking the source IP or user agent to match the Facebook scraper.
However, like most social sites, you are likely using the same URLs to return customized contents rendered for the currently logged in user. This is inherently unscrapable - because there is a different version of say '/profile' for each user.