Search code examples
pythonstringlistweb-scrapinggenerator

How do I get a normal list with strings instead of generator objects when I perform a googlesearch


Hi I am trying to get the first url of a google search based on queries in a list. For the sake of simplicity I am going to use the same code as a similar question 2 years prior.

from googlesearch import search

list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []

for query in list_of_queries:
    results.append(search(query, tld="co.in", num=1, stop=1, pause=2))

print (results)

Now this returns a list of generator objects. A solution was found to print out the list of results by adding

for result in results:
    print (list(results))

However I want the results list to be in the form of a list of strings in order to web scrape the urls for data. One solution I found was to add

results_str = []
for result in results:
    results_str.append(list(result))

When I print results_str I get this as an output:

[['https://www.geeksforgeeks.org/'], ['https://stackoverflow.com/'], ['https://github.com/']]

As one can see I cannot even use results_str directly as a list of urls to webscrape because of the additional brackets around each url. I thought I could work around it by removing the brackets by following this answer and thus adding

results_str_new = [s.replace('[' and ']', '') for s in results_str]

But this simply results in an AttributeError

AttributeError: 'list' object has no attribute 'replace'

Either way even if I did get it to work it all seems unnecessarily unnecessary to do all this work just to convert a list of generator objects to strings to use as urls to webscrape so I was wondering if there were any alternatives. I know that one of my options is to use selenium but I don't really want to do that because I don't want the hassle of an instance of Chrome opening whenever I run my script.

Thanks in advance


Solution

  • You are getting back a list of lists of string. To change that, you can use a list comprehension like this

    results_str = [url for result in results for url in result]
    

    or you can change from append to extend if you don't want to go with a list comprehension. Extend just extends the list where es append inserts the lists into the list.

    results_str = []
    for result in results:
        results_str.extend(result)