Search code examples
htmlweb-crawlerstripmarkup-extensions

Send only HTML to a crawler/robot without css, js etc? Can it cause a negative ranking google?


Today some crawler passed my site and tries to access several css, js, images and other files. This was also a smart one, interprets also the javascript and want to execute an url that is assembled (and only known) inside the javascript code! That worries me allot.

Although my code on the site has detected this strange behaviour and send me three e-mails about it (this is a blocker class i have made in the past and it works perfectly and blocks further access (after three attacks the attacker stops), i want to eliminate these bad guys to the max to reduce network traffic, protect services and the time to spend on this b*llsh*t.

So i was just thinking, send stripped html file content (without forms, css and js declarations) to the robot/crawler but it must not effect the good ones. The idea behind this that a crawler does not need markup things, only the content is important, right? Only the bad ones tries to access files that is not important to ranking.... but when it is not there, there is nothing to worry about. So my question is:

Is it safe to do this, i mean, does google not 'thinking' you fake the content? Removing parts especially for a crawler is slightly different to the original content.

Does have anyone experience with this?

PS: For me it is easy to implement because my template system can use useragent/device specific conditions to serve for example another markup or settings.To give you any idea how it works, here is an example for a handheld device:

<!-- #IF $is_handheld -->
<!-- iPhone, iPad mobile/handheld devices -->
<meta name="viewport" content="width=device-width,initial-scale=0.9,maximum-scale=2.0,user-scalable=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<!-- #ENDIF -->

This is an example i think about to implement:

<!-- #IF !$is_robot -->
<link rel="stylesheet" type="text/css" charset="utf-8" href="css/dialog.css" />
<link rel="stylesheet" type="text/css" charset="utf-8" href="css/general.css" />
<script rel="combine,minify" type="text/javascript" charset="utf-8" src="js/general.js"></script>
<!-- #ENDIF -->        

Solution

  • Have test this for a period of time, and NO there are no side-effects, it does not cause a negative ranking.

    Besides, another positive thing is that bad robots cannot find any css and js files to mess with, because they are not specified. So, in some ways it is also a safe way to protect other content.

    Also when you use only background images, you can avoid that it is indexed by a robot.