Search code examples
pythonbeautifulsouppython-requestsurllib

DuckDuckGo results scraping


I had a problem running my code, and found a perfect solution for this on StackOverflow. But, when I make necessary changes and run it, I get no output.

Code:

from bs4 import BeautifulSoup
import urllib.parse
import requests

r = requests.get('https://duckduckgo.com/html/?q=test')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('a', attrs={'class':'result__url'}, href=True)

for link in results:
    url = link['href']
    o = urllib.parse.urlparse(url)
    d = urllib.parse.parse_qs(o.query)
    print(d['uddg'][0])

urlparse() for path components " From this take the query string and pass it to parse_qs() to further process it. You can then extract the link using the uddg name." This is supposed to be the first few results:

http://www.speedtest.net/
https://www.merriam-webster.com/dictionary/test
https://en.wikipedia.org/wiki/Test
https://www.thefreedictionary.com/test
https://www.dictionary.com/browse/test

I get no output. Output:

In [14]:

Solution

  • You're getting a 403, thus you have no results. To fix this, add headers.

    Here's how:

    import requests
    from bs4 import BeautifulSoup
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0",
    }
    
    page = requests.get('https://duckduckgo.com/html/?q=test', headers=headers).text
    soup = BeautifulSoup(page, 'html.parser').find_all("a", class_="result__url", href=True)
    
    for link in soup:
        print(link['href'])
    
    

    Output:

    https://www.merriam-webster.com/dictionary/test
    https://www.speedtest.net/
    https://www.dictionary.com/browse/test
    https://www.thefreedictionary.com/test
    https://www.thesaurus.com/browse/test
    https://en.wikipedia.org/wiki/Test
    https://www.tests.com/
    http://speedtest.xfinity.com/
    https://fast.com/
    https://www.spectrum.com/internet/speed-test
    https://projectstream.google.com/speedtest
    https://dictionary.cambridge.org/dictionary/english/test
    http://www.act.org/content/act/en/products-and-services/the-act.html
    ...