I want to scrape sitelinks which are shown in the google search results(like About us Home Page etc..) . Is there any way I can retrieve them ? enter image description here
I recently implement Google Search JSON API, and from my understanding, the only way to get the website links is through the JSON Callback where each result contains formattedUrl or htmlFormattedUrl. The query would be the site in question and hopefully the first results would give you relevant links of the site.
However, if I properly understood your question, you want to scrap the sub-links of a given website which is something that a web crawler would do. If you are the owner of the website, you can create a sitemap using many tools around the web, but if your intentions can be classified as "other", then I believe that you are barking at the wrong tree. See this question which will pinpoint you to create a simple WebCrawler.
// Example customsearch#result item in which the query was Deovandski.
"items": [
{
"kind": "customsearch#result",
"title": "Student Experience - College of Science and Mathematics (NDSU)",
"htmlTitle": "Student Experience - College of Science and Mathematics (NDSU)",
"link": "https://www.ndsu.edu/scimath/currentstudents/student_experience/",
"displayLink": "www.ndsu.edu",
"snippet": "Sep 16, 2015 ... Association for Computing Machinery Student Chapter Chair: Jordan Goetze \nAdvisor: Brian Slator. Upsilon Pi Epsilon President: Deovandski ...",
"htmlSnippet": "Sep 16, 2015 \u003cb\u003e...\u003c/b\u003e Association for Computing Machinery Student Chapter Chair: Jordan Goetze \u003cbr\u003e\nAdvisor: Brian Slator. Upsilon Pi Epsilon President: \u003cb\u003eDeovandski\u003c/b\u003e ...",
"cacheId": "pyzF9XJwrXsJ",
"formattedUrl": "https://www.ndsu.edu/scimath/currentstudents/student_experience/",
"htmlFormattedUrl": "https://www.ndsu.edu/scimath/currentstudents/student_experience/",
"pagemap": {
"cse_image": [
{
"src": "https://www.ndsu.edu/fileadmin/_processed_/csm_080117_anatomy_03med_9dbc3c8cce.jpg"
}
],
"cse_thumbnail": [
{
"width": "184",
"height": "275",
"src": "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcTTL-GZRfSv30cyESsCnd_65BFoLMDdo8fqNS58mHfRbGiOTjSq-e-o28FE"
}
]
}
},