Search code examples
pythonstackexchange-api

How do I submit a Stack Exchange API query that returns the same results as the basic Stack Overflow search?


I am currently working on a project with the goal of determining the popularity of various topics on gis.stackexchange. I am using Python to interface with the stack exchange API. My issue is I am having trouble configuring the API query to match what a basic search using the search bar would return (showing posts containing the term (x)). I am currently using the /search/advanced... q="term" method, however I am getting empty results for search terms that might have around 100-200 posts. I have read a lot of the API documentation, but can't seem to configure the API query to match what a site search would yield.

Edit: For example, if I search, "Bayesian", I get 42 results on gis.stackexchange, but when I set q=Bayesian in the API request I get an empty return.

I have included my program below if it helps. Thanks!

#Interfacing_with_SO_API
import requests as rq
import json
import time

keywordinput = input('Enter your search term. If two words seperate by - : ')


baseurl = ('https://api.stackexchange.com/2.3/search/advanced?page=')

endurl = ('&pagesize=100&order=desc&sort=votes&q=' + keywordinput + '&site=gis.stackexchange&filter=!-nt6H9O0imT9xRAnV1gwrp1ZOq7FBaU7CRaGpVkODaQgDIfSY8tJXb')



urltot = ('https://api.stackexchange.com/2.3/search/advanced?page=1&pagesize=100&order=desc&sort=votes&q=' + keywordinput + '&site=gis.stackexchange&filter=!-nt6H9O0imT9xRAnV1gwrp1ZOq7FBaU7CRaGpVkODaQgDIfSY8tJXb')
response = rq.get(urltot)

page = range(1,400)

if response.status_code == 400:
    print('Initial Response Code 400: Stopping')
    exit()
elif response.status_code == 200:
    print('Initial Response Code 200: Continuing')

datarr = []
for n in page:
    response = rq.get(baseurl + str(n) + endurl)
    print(baseurl + str(n) + endurl)
    time.sleep(2)
    if response.status_code == 400 or response.json()['has_more'] == False or n >400:
        print('No more pages')
        break
    elif response.json()['has_more'] == True:
        for data in response.json()['items']:
            if data['view_count'] >= 0:
                datarr.append(data)
                print(data['view_count'])
                print(data['answer_count'])
                print(data['score'])

#convert datarr to csv and save to file
with open(input('Search Term Name (filename): ') + '.csv', 'w') as f:
    for data in datarr:
        f.write(str(data['view_count']) + ',' + str(data['answer_count']) + ','+ str(data['score']) + '\n')
exit()

Solution

  • If you look at the results for searching bayesian on the GIS StackExchange site, you'll get 42 results because the StackExchange site search returns both questions and answers that contain the term.

    However, the standard /search and /search/advanced API endpoints only search questions, per the doc (emphasis mine):

    Searches a site for any questions which fit the given criteria

    Discussion

    Searches a site for any questions which fit the given criteria.

    Instead, what you want to use is the /search/excerpts endpoint, which will return both questions and answers.

    Quick demo in the shell to show that it returns the same number of items:

    curl -s --compressed "https://api.stackexchange.com/2.3/search/excerpts?page=1&pagesize=100&site=gis&q=bayesian" | jq '.["items"] | length'
    
    42
    

    And a minimal Python program to do the same:

    #!/usr/bin/env python3
    
    # file: test_so_search.py
    
    import requests
    
    if __name__ == "__main__":
        api_url = "https://api.stackexchange.com/2.3/search/excerpts"
    
        search_term = "bayesian"
        qs = {
            "page": 1,
            "pagesize": 100,
            "order": "desc",
            "sort": "votes",
            "site": "gis",
            "q": search_term
        }
    
        rsp = requests.get(api_url, qs)
    
        data = rsp.json()
    
        print(f"Got {len(data['items'])} results for '{search_term}'")
    

    And output:

    > python test_so_search.py
    Got 42 results for 'bayesian'