How does Google perform search so quickly?
After spending some time thinking about search I've realized how complicated it is.
On the query side: If the number of queries people type is limited (e.g. 1-2 words), the results could be precomputed for all websites and then looked up.
This could work well if queries are 1-2 words long. The number of words in real queries can be large though and therefore the number of unique queries is almost infinite. This means that whilst some queries may be precomputed, the rest are almost certainly not. It's interesting to note that Google doesn't take a significantly longer time to return a longer query than a shorter one.
On the document side: From my understanding Google constructs features from each document (such as word frequency, links, etc) and runs it through an algorithm (most likely a machine learning algorithm?).
Is it correct to assume that some features are the same for all queries (such as the number of links to highly ranked websites) whilst other features are more specific to the query (e.g. how many times the query words appear in the document)?
Given the above, does this mean that each query must generate features for every document in Google's index? If not how does Google narrow this down? How does so much computation happen so quickly? Is a significant portion of Google's infrastructure used for each query to compute these features and predict the relevance in parallel, or do they approach it in another way?
Please note that whilst only the top 10-20 of the links for results may be clicked, Google still has to understand which 10-20 documents from their whole library they are - which means that all the documents need to be evaluated to some degree.
On the user side: Further to the above, Google customize the results based on your previous search history/habits. Does this mean that documents have features based on each user individually as well (for example the number of similar documents you've looked at in the past)? Or does Google use clustering methods to cluster their users and have document features for users most similar to you (e.g. 90% of users who visited the sites you have visited have clicked on this link to this document)?
If it's none of the above then how do they achieve customization of search results for so many documents on the web? And how do they do it in a fraction of a second?
String search, considered algorithmically, is fast. General case search is linear, O(n) ("BigO of n") but optimizations, such as the Boyer-Moore Horspool algorithm, can significantly improve on typical string search for well-behaved inputs. In addition, search is trivially parallelizable - if you have N computers and want to search a large document faster, just split the document into N ranges and let each computer search that range. So, search admits linear speedup with parallelization.
Of course, a web search engine cannot afford to string-search when you submit a search-query - such a search-engine would be painfully slow. The key is to realize that it is not necessary to search the documents at the time the search query is submitted because the documents being searched are static (a webpage does not generally change very frequently). In addition, people are not usually searching for "exact words in a webpage" but, rather, for a location on the Internet that correlates, in their mind, to some search query. So, the search engine is really returning web locations based on search queries.
Wiki has a good explanation of how search engines work, here. Crawling and indexing are done "in the background", so they do not affect the speed with which your search query is serviced. The goal of the index is to arrange the information gleaned during crawling into a form that can be quickly searched based on user queries.
As you noted, both queries and documents have statistical patterns - any kind of pattern can be leveraged to speed up search. Yahoo recently open-sourced its search engine, called Vespa. If you want to really dig into the details of how a modern search-engine works, this would be a great place to start. Notice that the high-level architecture of Vespa is not really about searching, it's about enabling parallelism across a set of content-management tasks, one of which is search. In other words, fast web search is really a byproduct of parallelism applied to general-purpose content management.
Ranking is the key to success for web search. Users want queries that are relevant to what they have searched and relevance is based on ranking. Ranking is performed during indexing and is a function of content, content location, and user feedback (click-through). As you noted, machine-learning is doubtless utilized nowadays for this although early systems, such as Google's PageRank were not ML-based.
User-specific optimization of search results are more complex. They might be using ML but non-ML algorithms may be as fast or faster. The complexity is softened by two considerations. First, user-specific preferences are really just a delta to the default ranking system. In the worst-case scenario (you have no idea what the user is asking), you just use the default ranking. Since this is already fast, user-specific optimizations should never slow down a search. Second, user-specific optimizations are fuzzy - the optimization is never wrong so long as it's not drastically wrong. For example, suppose a heavy-metal fan searches "metal bands" and your search engine confidently returns a list of heavy-metal bands - but, in this case, they really happened to be searching for steel bands for shipping. Your optimization was wrong but not wrong - it's OK to be wrong as long as you're not drastically wrong (returning results about rabbits, for example). There are very fast classical algorithms (non-ML) for these kinds of fuzzy problems.
tl;dr: