python python-3.x web-scraping concurrent.futures

Unable to print results from a function while using concurrent.futures in some customized way

I've created a script using concurrent.futures library to print the result from fetch_links function. When I use print statement inside the function, I get the results accordingly. What I wish to do now is print the result from that function using yield statement.

Is there any way I can modify things under main function in order to print the result from fetch_links function keeping it as is, meaning keeping the yield statement?

import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf

links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
    "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
    "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]

base = 'https://stackoverflow.com{}'

def fetch_links(s,link):
    r = s.get(link)
    soup = BeautifulSoup(r.text,"lxml")
    for item in soup.select(".summary .question-hyperlink"):
        # print(base.format(item.get("href")))
        yield base.format(item.get("href"))

if __name__ == '__main__':
    with requests.Session() as s:
        with cf.ThreadPoolExecutor(max_workers=5) as exe:
            future_to_url = {exe.submit(fetch_links,s,url): url for url in links}
            cf.as_completed(future_to_url)

Solution

Your fetch_links is a generator, so you have to loop over that too, to get the results:

import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf

links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
    "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
    "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]

base = 'https://stackoverflow.com{}'


def fetch_links(s, link):
    r = s.get(link)
    soup = BeautifulSoup(r.text, "lxml")
    for item in soup.select(".summary .question-hyperlink"):
        yield base.format(item.get("href"))


if __name__ == '__main__':
    with requests.Session() as s:
        with cf.ThreadPoolExecutor(max_workers=5) as exe:
            future_to_url = {exe.submit(fetch_links, s, url): url for url in links}
            for future in cf.as_completed(future_to_url):
                for result in future.result():
                    print(result)

Output:

https://stackoverflow.com/questions/64298886/rvest-webscraping-in-r-with-form-inputs
https://stackoverflow.com/questions/64298879/is-this-site-not-suited-for-web-scraping-using-beautifulsoup
https://stackoverflow.com/questions/64297907/python-3-extract-html-data-from-sports-site
https://stackoverflow.com/questions/64297728/cant-get-the-fully-loaded-html-for-a-page-using-puppeteer
https://stackoverflow.com/questions/64296859/scrape-text-from-a-span-tag-containing-nested-span-tag-in-beautifulsoup
https://stackoverflow.com/questions/64296656/scrapy-nameerror-name-items-is-not-defined
https://stackoverflow.com/questions/64296201/missing-values-while-scraping-using-beautifulsoup-in-python
https://stackoverflow.com/questions/64296130/how-can-i-identify-the-element-containing-the-link-to-my-linkedin-profile-after
https://stackoverflow.com/questions/64295959/why-use-scrapy-or-beautifulsoup-vs-just-parsing-html-with-regex-v2
https://stackoverflow.com/questions/64295842/how-to-retreive-scrapping-data-from-web-to-json-like-format
https://stackoverflow.com/questions/64295559/how-to-iterate-through-a-supermarket-website-and-getting-the-product-name-and-pr
https://stackoverflow.com/questions/64295509/cant-stop-asyncio-request-for-some-delay
https://stackoverflow.com/questions/64295244/paginate-with-network-requests-scraper
and so on ...