Search code examples
facebookangularjsnginxphantomjsprerender

Nginx config for single page angularjs website to work with prerender.io for Facebook Open Graph


I have a single page angularjs web app. I'm trying enable it to be crawlable by search engines. To achieve this I'm using prerender.io, a nodejs webserver with a phantomjs browser to render ajax pages.

I am basing my nginx config off the following gist: https://gist.github.com/Stanback/6998085

This works for the most part. I can curl my app and get the correct response: curl -o test.html domain.com/?_escaped_fragment=/path

The request is redirected to the prerender.io proxy and the proxy makes a single request with the following url: domain.com/#!/path

All other requests (js, img, css and xhr) pass through nginx as normal. Phantomjs has no trouble rendering the proxy request after waiting for the following JS variable to be set to true: window.prerenderReady = false;

This is all great... Google can crawl my website! Enter Facebook.

I'm setting a number of OG metatags so that I can use the Facebook like button (iFrame). The following metatags are set for each page:

<link rel="canonical" href="http://domain.com/#!/asset">
<meta property="og:url" content="http://domain.com/#!/asset">
<meta property="og:type" content="website">
<meta property="og:image" content="http://domain.com/image.jpg">
<meta property="fb:app_id" content="xxx">
<meta property="og:description" content="foo">
<meta property="og:title" content="bar">
<meta property="og:site_name" content="domain.com">

These metatags are updated correctly by angularjs for each asset and the phantomyjs proxy correctly waits for them to be updated before returning the response.

However when I test the URL http://domain.com/#!/asset with the Facebook URL linter I get some problems.

  1. Facebook claims that the canonical URL and the og:url differ, however when I click "See exactly what our scraper sees for your URL" they are idential
  2. When I click "See exactly what our scraper sees for your URL" the canonical and og:url have been replaced with domain.com/?fb_locale=en_GB#!/asset
  3. The proxy receives 3 requests. The first for the asset then it seems it follows the canonical and og:url
  4. When a user clicks the Like this page iFrame the link back to my website looks like domain.com/?_escaped_fragment_=/asset

Number 4 is the issue that is a deal breaker. If a user likes a page on my post it goes into their Facebook activity stream. If that user then clicks on the link back to my site in their stream it will direct them through the proxy and render the page through phantomjs!

I'm guessing that I shouldn't be sharing the links with the hash-bang through Facebook. I think I should be sharing a link and setting the canonical / og:url to something like domain.com/static/asset. The nginx config should be updated to catch /static urls, if useragent = Facebook or params contain _escaped_fragment_ then direct to proxy else redirect the user to #!/asset.

I have tired all that I can think to get a modified nginx config to work with this however it has beaten me. When I intercept those /static urls and rewrite to the proxy randomly image, css and js assets are requested through the proxy and phantomjs crashes.

Could someone please help me modify this nginx config so that I can forward web crawler requests to the proxy, allow facebook to scrape the correct og tags off my site AND have the correct link-back url specified when users share my content on Facebook?


Solution

  • Did you figure this out yet? Facebook doesn't do a very good job with #! urls. This StackOverflow answer does a good job explaining it: How to handle facebook sharing/like with hashbang urls?

    When a user is on a page on your site (http://domain.com/#!/asset) and does a sharing action on your website, it should share the canonical url http://domain.com/asset.

    Then if a user visits http://domain.com/asset, you just redirect them to http://domain.com/#!/asset.

    And if Facebook accesses the canonical URL (http://domain.com/asset), then redirect it to your Prerender.io server.

    Or...just switch from #! to html5 pushstate, and you won't have to do any of the #! redirecting for Facebook. That way the proxy becomes more simple, so you'd always just proxy any request from Facebook to your Prerender.io server