Search code examples
pythonbingbing-apibing-search

How to exclude particular websites in Bing Web Search API result?


I'm using Bing Web Search API to search for some info in Bing.com. However, for some search queries, I'm getting websites from search, which are not relevant for my purposes.

For example, if the search query is books to read this summer sometimes Bing returns youtube.com or amazon.com as a result. However, I don't want to have these kinds of websites in my search results. So, I want to block these kinds of websites before the search starts.

Here is my sample code:

    params = {
        "mkt": "en-US",
        "setLang": "en-US",
        "textDecorations": False,
        "textFormat": "raw",
        "responseFilter": "Webpages",
        "q": "books to read this summer",
        "count": 20
    }

    headers = {
        "Ocp-Apim-Subscription-Key": MY_APY_KEY_HERE,
        "Accept": "application/json",
        "Retry-After": "1",
    }


    response = requests.get(WEB_SEARCH, headers=headers, params=params, timeout=15)

    response.raise_for_status()

    search_data = response.json()

search_data variable contains links I want to block without iteration over the response object. Particularly, I want Bing not to include youtube and amazon in the result.

Any help will be appreciated.

Edited: WEB_SEARCH is the web search endpoint


Solution

  • After digging Bing Web Search documentation I found an answer. Here it is.

    LINKS_TO_EXCLUDE = ["-site:youtube.com", "-site:amazon.com"]
    
    
    def bing_data(user_query):
    
        params = {
            "mkt": "en-US",
            "setLang": "en-US",
            "textDecorations": False,
            "textFormat": "raw",
            "responseFilter": "Webpages",
            "count": 30
        }
    
        headers = {
            "Ocp-Apim-Subscription-Key": MY_APY_KEY_HERE,
            "Accept": "application/json",
            "Retry-After": "1",
        }
        
        # This is the line what I needed to have
        params["q"] = user_query + " " + " ".join(LINKS_TO_EXCLUDE)
    
        response = requests.get(WEB_SEARCH_ENDPOINT, headers=headers, params=params)
        response.raise_for_status()
        search_data = response.json()
    
        return search_data