I'm trying to use the googlesearch api in Python to get the top 10 results for several queries, and I'm encountering two issues:
If anyone knows how to do this with googlesearch or any other free API that would be great.
Thanks!
# coding: utf-8
from googlesearch import search
from urlparse import urlparse
import csv
import datetime
keywords = [
"best website builder"
]
countries = [
"us",
"il"
]
filename = 'google_results.csv'
with open(filename, 'w') as f:
writer = csv.writer(f, delimiter=',')
for country in countries:
for keyword in keywords:
print "Showing results for: '" + keyword + "'"
writer.writerow([])
writer.writerow([keyword])
for url in search(keyword, lang='en', stop=10, country=country):
print(urlparse(url).netloc)
print(url)
writer.writerow([urlparse(url).netloc, url])
Answer 1. Your country format is incorrect.
What the module is doing is building the URL to make the request. With the following format:
url_search = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&cr=%(country)s"
When you give it a country, simply passing in us
or il
is not enough. You want the country parameter to be in the format of countryXX
where XX is the two letter abbreviation. For example France is FR
. So country will be countryFR
.
And even in the source code it say that this parameter is not always reliable.
:param str country: Country or region to focus the search on. Similar to
changing the TLD, but does not yield exactly the same results.
Only Google knows why...
Answer 2: Ads are dynamically loaded using JavaScript. This library on the other hand only does static parsing. It does not execute any of the JavaScript. You will need to run Selenium or pyppeteer to have the browser execute the JavaScript to get the ads.