Search code examples
phphttp-headersspam-preventionbots

Are there HTTP header fields I could use to spot spam bots?


It stands to reason that scrapers and spambots wouldn't be built as well as normal web browsers. With this in mind, it seems like there should be some way to spot blatant spambots by just looking at the way they make requests.

Are there any methods for analyzing HTTP headers or is this just a pipe-dream?

Array
(
    [Host] => example.com
    [Connection] => keep-alive
    [Referer] => http://example.com/headers/
    [Cache-Control] => max-age=0
    [Accept] => application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
    [User-Agent] => Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7
    [Accept-Encoding] => gzip,deflate,sdch
    [Accept-Language] => en-US,en;q=0.8
    [Accept-Charset] => ISO-8859-1,utf-8;q=0.7,*;q=0.3
)

Solution

  • If I were writing a spam bot, I would fake the headers of a normal browser, so I doubt this is a viable approach. Some other suggestions that might help

    Instead

    • use a captcha
    • if that's too annoying, a simple but effective trick is to include a text input which is hidden by a CSS rule; users won't see it, but spam bots won't normally bother to parse and apply all the CSS rules, so they won't realise the field is not visible and will put something in it. Check on form submission that the field is empty and disregard it if it is.
    • use a nonce on your forms; check that the nonce that was used when you rendered the form is the same as when it's submitted. This won't catch everything, but will ensure that the post was at least made by something that received the form in the first place. Ideally change the nonce every time the form is rendered.