Just after a few weeks of working fine, our Socket.io started spewing errors on some browsers. I've tried updated to the latest Socket.io version, I've tried our setup on different machines, I've tried all sorts of machines, it seems to work on most browsers with no clear pattern of which work.
These errors appear on a second interval:
OPTIONS https://website.com/socket.io/?EIO=2&transport=polling&t=1409760272713-52&sid=Dkp1cq0lpKV75IO8AdA3 socket.io-1.0.6.js:2
XMLHttpRequest cannot load https://website.com/socket.io/?EIO=2&transport=polling&t=1409760272713-52&sid=Dkp1cq0lpKV75IO8AdA3. Invalid HTTP status code 400
We're behind Amazon's ELB, Socket.io on polling because the ELB router doesn't support WebSockets.
I found the problem that has been causing this, and it's is really unexpected...
This problem comes from using load balanced services like AWS ELB (independent EC2 should be fine though) and Heroku, their infrastructure doesn't support Socket.io features fully. AWS ELB flat out won't support WebSockets, and Heroku's router is trash for Socket.io, even in conjunction with socket.io-redis.
The problem is hidden when you use a single server, but as soon you start clustering, you will get issues. A single Heroku dyno on my application worked fine, and then the problems started appearing in production out of development, when we weren't using more than one server. We tried on ELB with sticky-load balance and even then, we still had the same issues.
When socket.io returns 400 errors, in this case it was saying "This session doesn't exist and you never completed the handshake", because you completed the handshake on a different server in your cluster.
The solution for me was just dedicating an EC2 instance for my web app to handle Socket.io.