I noticed that many websites, even Google and some banking sites, have poorly-written HTML with no quotes around the values of attributes, or using characters such as ampersands not escaped correctly in links. In other words, many use markup that would not validate.
I am curious about their reasons. HTML has simple rules and it is just mind-boggling that they don't seem to follow those rules. Or do they use programs that just spit out the code?
Most people have gotten the answer basically right — that the rules are different when you serve a page a billion times a day. Bytes begin to matter, and the current level of compression clearly shows that Google is concerned with saving bandwidth.
A few points:
One, people are implying that Google's reasons for saving bandwidth are financial. Unlikely. Even a few terabytes a day saved on the Google search results page is a drop in the bucket compared to the sum of all their properties: Youtube, Blogger, Maps, Gmail, etc. Much more likely is that Google wants its search results page, in particular, to load as quickly as possible on as many devices as possible. Yes, bytes matter when the page is loaded a billion times a day, but bytes also matter when your user is using a satellite phone in the Sahara and struggling to get 1kbps.
Two, there is a difference between the codified standards of XHTML and the like, and the de-facto standard of what actually works in every browser ever made since 1994. Here, Google’s scale matters because, where most web developers are happy to ignore any troublesome browser that accounts for less than 0.1% of their users, for Google, that 0.1% is perhaps a half million people. They matter. So their search-results page ought to work on IE 5.5. This is the reason they still use tables for layout on many high-value pages – it’s still the layout that “just works” on the greatest number of browsers.
As an exercise, while an intern at Google, I wrote a perfectly compliant XHTML/CSS version of Google’s search result page and showed it around. Eventually the question came up – why are we serving such hodge-podge HTML? Shouldn’t we be leading the web dev community towards standards? The answer I got was pretty much the second point above. Google DOES follow a standard – not the wouldn’t-it-be-nice standards of web utopia, but the this-has-to-work-absolutely-everywhere standard of reality.