A website I am scraping sometimes redirects to a page with a form which I would like to handle in the Downloader Middleware. The idea is that every time this redirect occurs, it automatically submits the form and retrieve the results. My middleware looks something like:
from scrapy import FormRequest
class SubmitFormMiddleware:
def process_response(self, request, response, spider):
if response.css('form.loginbox').getall():
post_form_url = response.css('form.loginbox::attr(action)').get()
return FormRequest(url=response.urljoin(post_form_url),
formdata={'username': 'my_username',
'password': 'my_password',
'data_selection': 'all'
},
method='POST',
dont_filter=True)
else:
return response
This doesn't work since I don't have any callback defined (and I shouldn't because I am in middleware):
NotImplementedError: DefaultSpider.parse callback is not defined
If I wanted to just return a request I would have something like:
redirected = request.replace(url=response.urljoin(post_form_url))
return self._redirect(redirected, request, spider, response.status)
but this does not work for submitting a form. Does anybody know what the 'Scrapy-thonic' way is to use the FormRequest in a Downloader Middleware?
I managed to solve this problem in the following way:
from scrapy import FormRequest
class SubmitFormMiddleware:
def process_response(self, request, response, spider):
if response.css('form.loginbox').getall():
post_form_url = response.css('form.loginbox::attr(action)').get()
form_request_handle = FormRequest(url=response.urljoin(post_form_url),
formdata={'username': 'my_username',
'password': 'my_password',
'data_selection': 'all'
},
method='POST',
dont_filter=True)
return request.replace(url=form_request_handle.url,
method='POST',
body=form_request_handle.body,
headers=form_request_handle.headers,
dont_filter=True)
else:
return response
Although this works, I am still curious about the 'scrapy-thonic' way to solve submit a FormRequest
in the middleware.