Search code examples
pythonscrapyscrapy-splash

How to simulate mouse click in Scrapy-Splash


I am scraping a webpage,http://www.starcitygames.com/buylist/, and I need to click a button in order to access some data and so I am trying to simulate a mouse click but I am confused about exactly how to do that. I have had suggestions to just scrape the JSON instead because it would be a lot easier but I really do not want to scrape it. I would rather scrape the regular website. Here is what I have so far, I do not know exactly what to do to get it to click that display button, but this was my best try so far.

HTML Code

import scrapy
from scrapy.spiders import Spider
from scrapy_splash import SplashRequest
from ..items import NameItem

class LoginSpider(scrapy.Spider):
    name = "LoginSpider"
    start_urls = ["http://www.starcitygames.com/buylist/"]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
        response,
        formcss='#existing_users form',
        formdata={'ex_usr_email': '[email protected]', 'ex_usr_pass': 'password'},
        callback=self.after_login
        )



    def after_login(self, response):
        item = NameItem()
        element = splash:select('#bl-search-category') #CSS selector
        splash:mouse_click(x, y)# Confused about how to find x and y
        item["Name"] = response.css("div.bl-result-title::text").get()
        return item

Solution

  • Splash is a light weight option for rendering JS. If you have extensive clicking and navigation to do in menus that can't be reverse engineered then you probably don't want Splash unless you don't mind trying to write a LUA script. You may want to see this answer in regards to that.

    You will write a LUA script and pass it to the execute Splash endpoint. Depending how complex your task Selenium may be a better choice for your project. However, first thoroughly examine the target site and be SURE that you need to render JavaScript as rendering the JS is always the worst thing you can do if you don't have to for speed and resources.

    PS: We can't access this site without the login credentials. I would suspect that you don't need to render the JavaScript. That is the case 90%+ of the time.