javascript jquery asp.net-mvc search-engine web-crawler

Creating crawlable cross domain javascript widgets

I have been reading about making ajax heavy applications more search engine friendly: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

One of the solutions I have delivered recently relies heavily upon cross domain javascript widgets. A website that integrates this solution would include a piece of javascript in their website. For example:

<script type="text/javascript">
  var _lw = _lw || {};
  _lw._setAccount = ' 00000000-0000-0000-0000-000000000000' ;
  _lw._widgetType = '_widgetName';
  _lw._options = {};  

  (function() {
    var scriptsrc = document.createElement('script'); 
    scriptsrc.type = 'text/javascript'; 
    scriptsrc.async = true;
    scriptsrc.src = 'http://hostname/Application/js';

    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(scriptsrc, s);
  })();
</script>]

<div id="widget-container"></div>

This request appends to the dom a jQuery wrapper along with relevant application URL's that the widget will use. Note, the source of this piece of javascript is simply an asp.net mvc content result which outputs the relevant content:

[HttpGet]
[ActionName("js")]
public ContentResult RenderJavascript()
{
      // java script urls & jquery wrapper are output
}

Now that the necessary url's are available, the jQuery wrapper that was just appended kicks in and fires off a request to the server with the account information, the widget type and any relevant options. ASP.net MVC is simply acting as a content generator and it returns a JSONP result where the generated content is appended into the sites content container.

function loadWidget() {
    var jsonpUrl = _opts._widgetUrl + _lw._setAccount;
    jQuery.getJSON(jsonpUrl, _lw._options, function (data) {
        jQuery('#widget-container').html(data.html);
    });
}

The more I read on making this process more search engine friendly I'm at a loss of where to start. Generating the required HTML snapshot is easy enough however, how would I signify to google that a link should be crawled?

Another option that seemed somewhat promising was the section on how to handle pages without hash fragments. This would be accomplished by adding a meta tag that is making use of the javascript widgets.

<meta name="fragment" content="!">

However, the problem now is in the fact that this is a cross domain request and again wont' lead anywhere.

Solution

This is a tricky one. There's evidence that Google does crawl JavaScript (it doesn't execute it, per se, the same way your browser does, but it will at least look for URLs in JavaScript, much like it does in Flash, Word, etc. documents. So there's a chance you don't need to do anything if Google spots 'http://hostname/Application/js' in your JavaScript and deigns to follow it, and then parse URLs out of the JavaScript returned.

If you want a more concrete solution, perhaps modify your code like this:

<div id="widget-container"><iframe src="http://hostname/Application/iframe"></div>

OR:

<div id="widget-container"><a href="http://hostname/Application/links"></div>

When your widget's JavaScript is executed by a browser you can replace the contents of #widget-container with what you'd normally replace it with (you could also add a style="visibility:invisible;" attribute but I suspect Google punishes "hidden" content wherever possible for obvious reasons), but when the Googlebot stops by it'll crawl the iframe/link. On the server side you can have those URLs generate the same list of links you had in JavaScript, but in easily-digested HTML.

An additional option that might make Googlebot even happier is using a Schema.org object like WebPageElement, e.g.:

<div id="widget-container" itemscope itemtype="http://schema.org/SiteNavigationElement">
  <link itemprop="url" href="http://hostname/Application/links">
</div>

...and then have the target page also contain Schema.org declarations. I'm not sure if you'd reap any real benefit from this over one of the previous forms, but metadata is a love note to the future.