Search code examples
angularfacebook.htaccessmod-rewritemeta-tags

.htaccess allow social media crawlers to work (Facebook and Twitter) | Angular 11 SPA


I've created a SPA - Single Page Application with Angular 11 which I'm hosting on a shared hosting server.

The issue I have with it is that I cannot share any of the pages I have (except the first route - /) on social media (Facebook and Twitter) because the meta tags aren't updating (I have a Service which is handling the meta tags for each page) based on the requested page (I know this is because Facebook and Twitter aren't crawling JavaScript).

In order to fix this issue I tried Angular Universal (SSR - Server Side Rendering) and Scully (creates static pages). Both (Angular Universal and Scully) are fixing my issue but I would prefer using the default Angular SPA build.

The approach I am taking:

  • Files structure (shared hosting server /public_html/):
- crawlers/
           - crawlers.php
           - share/
                   - 404.json
                   - about.json
                   - work.json
- .htaccess
- index.html
  • crawlers.php contains the following:
<?php

$page = filter_input(INPUT_GET, 'page');

if (file_exists('./share/'.$page.'.json')) {
    $file = file_get_contents('./share/'.$page.'.json');
} else {
    $file = file_get_contents('./share/404.json');
}

$data = json_decode($file);

return makePage($data); 

function makePage($data) { 
    $html  = '<!doctype html>'.PHP_EOL;
    $html .= '<html>'.PHP_EOL;

    $html .= '<head>'.PHP_EOL;

    $html .= '<meta property="og:type" content="website" />'.PHP_EOL;
    $html .= '<meta property="og:site_name" content="My Website" />'.PHP_EOL;
    $html .= '<meta property="og:title" content="'.$data->title.'" />'.PHP_EOL;
    $html .= '<meta property="og:description" content="'.$data->description.'" />'.PHP_EOL;
    $html .= '<meta property="og:image" content="'.$data->image.'" />'.PHP_EOL;

    $html .= '<meta name="twitter:card" content="summary_large_image"/>'.PHP_EOL;
    $html .= '<meta name="twitter:title" content="'.$data->title.'" />'.PHP_EOL;
    $html .= '<meta name="twitter:description" content="'.$data->description.'" />'.PHP_EOL;
    $html .= '<meta name="twitter:image" content="'.$data->image.'" />'.PHP_EOL;
    
    $html .= '<meta http-equiv="refresh" content="0;url='.$data->url.'">'.PHP_EOL;

    $html .= '</head>'.PHP_EOL;
    $html .= '<body></body>'.PHP_EOL;

    $html .= '</html>';

    echo $html;
}

?>

og:url is not specified because I thought that by not specifying it, Facebook will be unaware of the actual content URL and will link its cards to the static file. It shouldn't be a problem as I made use of the http-equiv="refresh", which will redirect normal users to the correct URL.

  • For example, 404.json contains the following:
{
  "title": "404: Not Found | My Website",
  "description": "My awesome description.",
  "image": "https://www.mywebsite.com/assets/images/share/404.jpg",
  "url": "https://www.mywebsite.com",
}

  • .htaccess contains the following:
RewriteEngine On
RewriteBase /

# Allow robots.txt to pass through
RewriteRule ^robots.txt - [L]

# Allow social media crawlers to work
RewriteCond %{HTTP_USER_AGENT} (facebookexternalhit/[0-9]|Twitterbot)
RewriteRule ^(.+)$ /crawlers/crawlers.php?page=$1 [NC,L]

# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]

# If the requested resource doesn't exist use index.html
RewriteRule ^ /index.html

When I am testing crawlers/crawlers.php?page=test-page, it works perfectly (after accessing https://www.mywebsite.com/crawlers/crawlers.php?page=test-page), reason why I believe the issue is in the .htaccess condition below # Allow social media crawlers to work. Sharing on Facebook still shows the meta tags of the first route (/), which means that the redirect to crawlers/crawlers.php doesn't work.
Also, on https://developers.facebook.com/tools/debug/sharing/ the url https://www.mywebsite.com/about is not redirecting to https://www.mywebsite.com/crawelers/crawlers.php?page=about.

I want to use the redirect to crawlers/crawlers.php for social media crawlers only for pages like this: https://www.mywebsite.com/about, https://www.mywebsite.com/work, etc but not for https://www.mywebsite.com (the first route - /).

Any help is very much appreciated. Thanks!


Solution

  • Thanks to @CBroe's guidance, I managed to make the social media (Facebook and Twitter) crawlers work (without using Angular Universal, Scully, Prerender.io, etc) for an Angular 11 SPA - Single Page Application, which I'm hosting on a shared hosting server.

    The issue I had in the question above was in .htaccess.

    This is my .htaccess (which works as expected):

    RewriteEngine On
    
    # Force www.
    RewriteCond %{HTTP_HOST} !^www\.
    RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]
    
    # If an existing asset or directory is requested go to it as it is
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
    RewriteRule ^ - [L]
    
    # Allow robots.txt to pass through
    RewriteRule ^robots.txt - [L]
    
    # Allow social media crawlers to work
    RewriteCond %{HTTP_USER_AGENT} (facebookexternalhit|WhatsApp|LinkedInBot|Twitterbot)
    RewriteRule ^(.+)$ /crawlers/social_media.php?page=$1 [R=301,L]
    
    # If the requested resource doesn't exist use index.html
    RewriteRule ^ /index.html
    

    PS I renamed crawlers.php to social_media.php, added WhatsApp and LinkedIn user agents and also added a redirect from mywebsite.com to www.mywebsite.com