Search code examples
ajaxpython-3.xpython-requestsxmlhttprequest

No response for XHR request in python with requests.get()


I want to scrape german poll data from a server. Here, I search for an examplary street, straße "Judengasse".

I have been trying to reproduce this. Unfortunately, the link from the reference is not intact anymore, so I couldn't directly compare it to my problem. Since I am fairly inexperienced, I do not know what is exactly needed to reproduce the request that is submitted via the web interface.

I don't now which attributes of the header are needed for my request to work and what of it might be redundant. In Chrome's inspect mode I see that in my case there are more header attributes than in the referenced example.

My code so far (which does not work) from trying to reproduce the SE post:

import requests

url = 'https://online-service2.nuernberg.de/Finder/action/getItems'
data = {
    "finder":"Wahlraumfinder",
    "strasse":"Judengasse",
    "hausnummer":"0"
    }

headers = {
           'Host': 'online-service2.nuernberg.de', 
           'Referer': 'https://online-service2.nuernberg.de/Finder/?Wahlraumfinder', 
           'Accept': '*/*', 
           'Accept-Encoding': 'gzip, deflate, br', 
           'Accept-Language': 'de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7', 
           'Connection': 'keep-alive', 
           'Content-Length': '312', 
           'Content-Type': 'multipart/form-data; boundary=----WebKitFormBoundaryeJZfrnZATOw6B5By', 
           'DNT': '1', 
           'Host': 'online-service2.nuernberg.de', 
           'Referer': 'https://online-service2.nuernberg.de/Finder/?Wahlraumfinder', 
           'Sec-Fetch-Mode': 'cors', 
           'Sec-Fetch-Site': 'same-origin', 
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36',
           'X-Requested-With': 'XMLHttpRequest'
           }

response = requests.get(url, data=data, headers=headers)

I don't get a respone. I added all request headers to headers.

Not sure, if more headers are needed.

Further, I am not sure if the url is correct.

I am looking to generate output of the following form, for this specific request "Judengasse":

Nr 0652
Wahllokal Willstätt.-Gym., Innerer Laufer Platz 11

This corresponds to putting in "Judengasse" into the search bar and hitting go on the search "Suche" and extracting parts of the first output box "Wahl-/Stimmbezirk"

When I look at the XHR in Chrome's dev mode:

General

Request URL: https://online-service2.nuernberg.de/Finder/action/getItems
Request Method: POST
Status Code: 200 OK
Remote Address: 193.22.166.102:443
Referrer Policy: no-referrer-when-downgrade

Response Header

Connection: Keep-Alive
Content-Length: 1149
Content-Type: application/json;charset=UTF-8
Date: Wed, 04 Dec 2019 00:21:30 GMT
Keep-Alive: timeout=5, max=100
Server: Apache

Request Header

Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Connection: keep-alive
Content-Length: 312
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryx2jHYJHo3ejnKw0l
DNT: 1
Host: online-service2.nuernberg.de
Origin: https://online-service2.nuernberg.de
Referer: https://online-service2.nuernberg.de/Finder/?Wahlraumfinder
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
X-Requested-With: XMLHttpRequest

From Data

------WebKitFormBoundaryx2jHYJHo3ejnKw0l
Content-Disposition: form-data; name="action"

"action/getItems"
------WebKitFormBoundaryx2jHYJHo3ejnKw0l
Content-Disposition: form-data; name="data"

{"finder":"Wahlraumfinder","strasse":"Judengasse","hausnummer":"0"}
------WebKitFormBoundaryx2jHYJHo3ejnKw0l--

Thank you for reading.


Solution

  • After some research I finally managed to get a 200 response from this server.

    Firstly, requests.get in this case should be replace by requests.post, since you want to replicate an HTTP POST request, according to the info you got from Chrome's dev mode, "General" section.

    Secondly, from the headers we can see that the data is sent as being of type "multipart/form-data" request. As far as I could understand, this is a type of request that is used to send files instead of regular data (more about this type of request here).

    So, I converted the string sent through the POST request to binary (this is achieved by prepending b) and passed it to the files parameter of the request. For some reason, this parameter requires a tuple (a, b) inside a set {c}, hence the {(None, data)}.

    I also passed the street name as a parameter to data, so it's easier to manipulate it.

    I got this working code (I'm using my browser's request):

    import requests
    
    url = 'https://online-service2.nuernberg.de/Finder/action/getItems'
    
    street = b'Judengasse'
    
    data = b'-----------------------------15242581323522\r\n' \
           b'Content-Disposition: form-data; name=\"action\"\r\n\r\n' \
           b'\"action/getItems\"\r\n-----------------------------15242581323522\r\n' \
           b'Content-Disposition: form-data; name="data"\r\n\r\n' \
           b'{\"finder\":\"Wahlraumfinder\",\"strasse\":\"%s\",\"hausnummer\":\"0\"}\r\n' \
           b'-----------------------------15242581323522--' % street
    
    headers = {"Host": "online-service2.nuernberg.de",
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0",
                "Accept": "*/*",
                "Accept-Language": "en-US,en;q=0.5",
                "Accept-Encoding": "gzip, deflate, br",
                "X-Requested-With": "XMLHttpRequest",
                "Content-Type": "multipart/form-data; boundary=---------------------------15242581323522",
                "Content-Length": "321",
                "Origin": "https://online-service2.nuernberg.de",
                "DNT": "1",
                "Connection": "keep-alive",
                "Referer": "https://online-service2.nuernberg.de/Finder/?Wahlraumfinder",
               }
    
    
    multipart_data = {(None, data,)}
    response = requests.post(url, files=multipart_data, headers=headers)
    
    print(response.text)
    

    I got this raw response:

    {"id":"8c4f7a57-1bd6-423a-8ab8-e1e40e1e3852","items":[{"zeilenbeschriftung":"Wahl-/Stimmbezirk","linkAdr":null,"mapUrl":"http://online-service.nuernberg.de/Themenstadtplan/sta_gebietsgli
    ederungen.aspx?p_urlvislayer=Stimmbezirke&XKoord=4433503.05&YKoord=5480253.301&Zaehler=1&Textzusatz=Judengasse+0&z_XKoord=4433670.0&z_YKoord=5480347.0&z_Zaehler=1&z_Textzusatz=Wahllokal%
    20Willst%E4tt.-Gym.%2C+Innerer+Laufer+Platz+11","items":["0652","Judengasse, Neue Gasse","Willstätt.-Gym., Innerer Laufer Platz 11","Zi. 101 ,1. OG",null]},{"zeilenbeschriftung":"Stimmkr
    eis Landtagswahl","linkAdr":null,"mapUrl":"http://online-service.nuernberg.de/Themenstadtplan/sta_gebietsgliederungen.aspx?p_urlvislayer=Stimmkreis_LTW&XKoord=4433503.05&YKoord=5480253.3
    01&Zaehler=1&Textzusatz=Judengasse+0&p_scale=100000","items":["501","Nürnberg-Nord"]},{"zeilenbeschriftung":"Wahlkreis Bundestagswahl","linkAdr":null,"mapUrl":"http://online-service.nuer
    nberg.de/Themenstadtplan/sta_gebietsgliederungen.aspx?p_urlvislayer=Wahlkreis_BTW&XKoord=4433503.05&YKoord=5480253.301&Zaehler=1&Textzusatz=Judengasse+0&p_scale=150000","items":["244","N
    ürnberg-Nord"]}],"status":200}
    
    

    which you can easily parse to get the result you expect:

    print(response.json()["items"][0]["items"])
    

    yilding...

    ['0652', 'Judengasse, Neue Gasse', 'Willstätt.-Gym., Innerer Laufer Platz 11', 'Zi. 101 ,1. OG', None]
    
    

    Hope it helps.

    Regards