Search code examples
searchindexingsearch-enginegoogle-custom-search

Create a search engine on specific sites and gather specific info


I need to create a search engine that crawls thru a list of websites and searches there for a query, and those website all return some data in various formats and structures, I need to collect specific info (in a unique structure) from all these websites.

Is there a way I can do that with an existing engine such as Google Custom Search Engine? Or am I better creating one of my own? If yes, what's the first step I should take towards learning about indexing and searching these website efficiently and without filling up my servers with unuseful trash.

So to sum up, besides searching a query on each of these websites' search box, I need to handle the results of each of them appropriately and lay it over in a union structure in one place altogether. All the results are to be parsed and extracted into 4-6 fields (unless, of course, there is a way to this with Google CSE.


Solution

  • Google CSE provides some interfaces to the standard Google web search. You can control the user interface and the search parameters, but you have no control over the indexing, nor any direct access to the index data.

    You might be more interested in the Google Search API's that are available with GAE. These are quite different: they are search services in which you provide the data and control the indexes.