Search code examples
urlhyperlinkseourl-routingsearch-engine

From a development perspective, how does does the indeed.com URL structure and site work?


On the webmaster's Q and A site, I asked the following:

https://webmasters.stackexchange.com/questions/42730/how-does-indeed-com-make-it-to-the-top-of-every-single-search-for-every-single-c

But, I would like a little more information about this from a development perspective.

If you search Google for anything job related, for example, Gastonia Jobs (City + jobs), then, in addition to their search results dominating the first page of Google, you get a URL structure back that looks like this:

indeed.com/l-Gastonia,-NC-jobs.html

I am assumming that the L stands for location in the URL structure. If you do a search for an industry related job, or a job with a specific company name, you will get back something like the following (Microsoft jobs):

indeed.com/q-Microsoft-jobs.html

With just over 40,000 cities in the USA I thought, ok, maybe it's possible they looped through them and created a page for every single one. That would not be hard for a computer. But then obviously the site is dynamic as each of those pages has 10000s of results and paginated by 10. The q above obviously stands for query. The locations I can understand, but they cannot possibly have created a web page for every single query combination, could they?

Ok, it gets a tad weirder. I wanted to see if they had a sitemap, so I typed into Google "indeed.com sitemap.xml" I got the response:

indeed.com/q-Sitemap-xml-jobs.html

.. again, I searched for "indeed.com url structure" and, as I mentioned in the other post on webmasters, I got back:

indeed.com/q-change-url-structure-l-Arkansas.html

Is indeed.com somehow using programming to create a webpage on the fly based on my search input into google? If they are not, how are they able to have a static page for millions and millions and millions possible query combinations, have them dynamically paginate, and then have all of those dominate google's first page of results (albeit that very last question may be best for the webmasters QA)?

Does the javascript in the page somehow interact with the URL


Solution

  • It's most likely not a bunch of pages. The "actual" page might be http://indeed.com/?referrer=google&searchterm=jobs%20in%20washington. The site then cleverly produces a human readable URL using URL rewrite, fetches jobs in the database that matches the query, and voíla...

    I could be dead wrong of course. Truth be told, the technical aspect of it can probably be solved in a multitude of ways. Every time a job is added to the site, all pages that need to be done to match that job, might be created, thus producing an enormous amount of pages for Google to crawl.