Search code examples
javascripthtmlschema.orgjson-ldgoogle-search-platform

Implementing paywall: to avoid cloaking issues with paywall notice, should I specify it in the HTML or in the JSON-LD?


Question

The "paywall notice" does not seem to be recognized in Google's documentation. I am trying to make it visible to all, yet excluded from the page topic and content, without causing cloaking issues. Can I do this in the DOM (for example with the role attribute), or do I need to do it in the JSON-LD markup?

Background

I am implementing a website paywall using client-side JS, with a combination of open graph markup and CSS selectors.

The implementation is based on the programming suggestions by Google at https://developers.google.com/search/docs/advanced/structured-data/paywalled-content

There are 3 types of content on this site, and in this implementation all 3 are rendered by the server for every visitor regardless of paywall status:

  1. Free content, visible to all;
  2. Paywall notice, not part of the page content/topic, visible only when not logged in; and
  3. Paywalled content, visible only to logged in users and search crawlers.

Type 2 is what is causing trouble, and this is not documented by Google.

HTML

<html>
  <head>
  </head>
  <body>
    <div id="div-1" class="non-paywall">
      All visitors can see this sentence, whether or not subscribed.
    </div>
    <div id="div-2" class="paywall-notice" role="dialog">
      <!-- This element is the issue in question -->
      If you are setting this notice, you are logged out our not subscribed. You cannot see the main content of this page. Please subscribe!
    </div>
    <div id="div-3" class="paywall">
      This section is paid content. 
      If you can see it, you are a logged in subscriber or a verified crawler (e.g. googlebot or bingbot).
    </div>
</body>
</html>

JSON-LD

{
    "@context": "https://schema.org",
    "@type": "WebPage",
    "@id": "https:\/\/foo\/page\/#webpage",
    "mainEntityOfPage": {
        "@type": "Article",
        "mainEntityOfPage": "https:\/\/bar\/article"
    },
    "isAccessibleForFree": "False",
    "hasPart": [
        {
            "@type": "WebPageElement",
            "isAccessibleForFree": "True",
            "cssSelector": ".non-paywall"
        },
        {
            "@type": "WebPageElement",
            "isAccessibleForFree": "True",
            "cssSelector": ".paywall-notice"
        },
        {
            "@type": "WebPageElement",
            "isAccessibleForFree": "False",
            "cssSelector": ".paywall"
        }
    ]
}

If paywall notices (#2) are treated the same as #1, there seems to be a risk the crawlers will assume they are part of the page content and include in assessment of relevance to search intent.

I cannot find any official recognition of the existence of #2 or guidance on how to treat it, whilst honouring the objective of paywall markup and avoiding cloaking issues.

There are a combination of approaches at Handling isAccessibleForFree for client side paywalls, and a related issue at https://webmasters.stackexchange.com/questions/117936/isaccessibleforfree-and-paywalled-content-delivered-to-googlebots, neither of these address my original question above.

Optimally, I would like to implement this the way Google wants me to... if only I know what that was!

More background

In order to be able to serve paywalled content to googlebot, the server renders the same HTML to all visitors. After page load, some JS would check if visitor is googlebot, and if so:

  1. Remove the .paywall-notice element/s
  2. Show the .paywall element/s

There may also be periodic or interaction-driven checks to remove .paywall element/s for non-googlebot visitors, but that should not affect this question if the markup correctly shows googlebot that those element/s are paywalled.


Solution

  • Is it possible for you to detect the crawlers server side and not render the paywall-notice element at all? The point of this markup is so that you don't show different content to Googlebot vs an average anonymous visitor. I think as long as you wrap the "paid" content of the article in the paywall class you don't have to worry about getting penalized for cloaking.

    On wsj.com we have a server side paywall so when Googlebot comes to the site we don't even render any of those marketing offers like what you have in your paywall-notice element. We just render the full article and wrap the paid content in the paywall class. So if it's possible for you, send Googlebot the page without that paywall notice element.

    By the way, nyt.com has a front end paywall and they aren't doing anything special about marking up the marketing offers. They just mark up the paywalled content same as your example. Just make sure to remove paywall-notice from the hasPart array as it definitely shouldn't be in there.