Search code examples
phpfacebooktwittermean-stackfacebook-opengraph

Facebook/Twitter OpenGraph not scraping images on Node.js/Angular.js web application


I have recently worked on a MEAN Stack application, a sort of blog in a way, where authors post articles. To overcome the issue of OpenGraph applications not scraping Javascript, I implemented a static php page and a rule on my Nginx reverse proxy that redirects all calls from certain user-agents (like Facebook, Google+, Twitter, etc) to the static page, in order to properly scrape the data.

Everything is working great, except for one little detail, the OpenGraph based applications can't seem to be able to scrape the images in the articles, so the rich social sharing doesn't work out as expected.

For instance, testing the following link : https://moveramontanha.pt/article/5a21539cfdebb1074ed1436d

which redirects to the static page :

https://www.moveramontanha.pt/static_mam.php?id=5a21539cfdebb1074ed1436d

  • Facebook Sharing Debugger gives out the following errors - randomly :

Unsupported Image File Extension Provided og:image URL, https://www.moveramontanha.pt/uploads/authors/1512141975423.jpg does not have a supported extension.

or

The provided 'og:image' properties are not yet available because new images are processed asynchronously. To ensure shares of new URLs include an image, specify the dimensions using 'og:image:width' and 'og:image:height' tags.

  • Twitter Card Validator Log: (No Image)

INFO: Page fetched successfully INFO: 17 metatags were found INFO: twitter:card = summary tag found INFO: Card loaded successfully WARN: this card is redirected to https://www.moveramontanha.pt/static_mam.php?id=5a21539cfdebb1074ed1436d

I've tried adding extra tags like image width/height, changed image format, secure tags, etc. Nothing worked.

Did anyone else stumble across such an issue?

Thanks in advance!


Solution

  • For Twitter's card crawler, there is an in-depth troubleshooting FAQ here and here.

    I just tried the following request to fetch the image referenced in the page:

    curl -L -A Twitterbot -v https://www.moveramontanha.pt/uploads/authors/1501255270817.jpg

    This returns an HTML page, not a JPEG image. This means that Twitter's card crawler is unable to fetch a valid image.

    You should fix your server to return a valid JPEG image to the Twitterbot user-agent.