Hi I am trying to get the first url of a google search based on queries in a list. For the sake of simplicity I am going to use the same code as a similar question 2 years prior.
from googlesearch import search
list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []
for query in list_of_queries:
results.append(search(query, tld="co.in", num=1, stop=1, pause=2))
print (results)
Now this returns a list of generator objects. A solution was found to print out the list of results by adding
for result in results:
print (list(results))
However I want the results list to be in the form of a list of strings in order to web scrape the urls for data. One solution I found was to add
results_str = []
for result in results:
When I print results_str I get this as an output:
[['https://www.geeksforgeeks.org/'], ['https://stackoverflow.com/'], ['https://github.com/']]
As one can see I cannot even use results_str directly as a list of urls to webscrape because of the additional brackets around each url. I thought I could work around it by removing the brackets by following this answer and thus adding
results_str_new = [s.replace('[' and ']', '') for s in results_str]
But this simply results in an AttributeError
AttributeError: 'list' object has no attribute 'replace'
Either way even if I did get it to work it all seems unnecessarily unnecessary to do all this work just to convert a list of generator objects to strings to use as urls to webscrape so I was wondering if there were any alternatives. I know that one of my options is to use selenium but I don't really want to do that because I don't want the hassle of an instance of Chrome opening whenever I run my script.
Thanks in advance
You are getting back a list of lists of string. To change that, you can use a list comprehension like this
results_str = [url for result in results for url in result]
or you can change from append
to extend
if you don't want to go with a list comprehension. Extend just extends the list where es append inserts the lists into the list.
results_str = []
for result in results: