Search code examples
pythonseleniumbeautifulsoupgetpython-requests

Changing the '?' in get requests using Python on an ASP.NET site


I have been attempting to create a web scraper for the following site using requests and BeautifulSoup to pull the information, after which I will be appending this to an Excel sheet using xlsxwriter:

https://www.calcareers.ca.gov/CalHRPublic/Search/JobSearchResults.aspx#jcid=1&kw=a&classid=540&depid=274&locid=4&postdays=1&tenid=1&timid=1&minsal=2000&appmethid=1&socmajorcode=17-0000

The above link shows each query parameter from the previous page being set to random defaults.

Using requests I am able to deliver a payload with the same link as the above, which I plan to alter using user input once I can get past the '#' symbol.

This is the code that I'm currently using:

from bs4 import BeautifulSoup
import requests
import xlsxwriter

# Argument variables
payload = {
    'jcid': 'value1',
    'kw': 'value2',
    'classid': 'value3',
    'depid': 'value4',
    'locid': 'value5',
    'postdays': 'value6',
    'tenid': 'value7',
    'timid': 'value8',
    'minsal': 'value9',
    'appmethid': 'value10',
    'socmajorcode': 'value11'
}

# Request
r = requests.get(
    'https://www.calcareers.ca.gov/CalHRPublic/Search/JobSearchResults.aspx#', params = payload)

print(r.url)

The response that I am getting from the print(r.url) is:

https://www.calcareers.ca.gov/CalHRPublic/Search/JobSearchResults.aspx?jcid=value1&kw=value2&classid=value3&depid=value4&locid=value5&postdays=value6&tenid=value7&timid=value8&minsal=value9&appmethid=value10&socmajorcode=value11

The issue is that the website won't load with a '?' and instead needs to be passed '#'.

Any thoughts on how I could accomplish this with requests? It seems like this could be circumvented with selenium, but I wanted to give this a go because I've hit a brick wall with this one.


Solution

  • You can override the requests.get() method for your use.

    import requests
    
    class ASPRequest(requests.Request):
        def get(self, url, params, **kwargs):
            qstring = '#'
            for key, value in params.items():
                qstring = qstring+"{}={}&".format(key, value)
            return requests.get(url+qstring)
    
    payload = {
        'jcid': 'value1',
        'kw': 'value2',
        'classid': 'value3',
        'depid': 'value4',
        'locid': 'value5',
        'postdays': 'value6',
        'tenid': 'value7',
        'timid': 'value8',
        'minsal': 'value9',
        'appmethid': 'value10',
        'socmajorcode': 'value11'
    }
    
    r = ASPRequest().get(
        url = 'https://www.calcareers.ca.gov/CalHRPublic/Search/JobSearchResults.aspx', params = payload)
    
    print(r.url)