Search code examples
cssweb-scrapingscrapyscrapy-splash

Scrapy Selector CSS not returning child nodes


I am using Scrapy and Splash to crawl an AJAX web page.

Here is a simplified version of the page HTML:

<html>
    <head>
        <title>Title here</title>
    </head>
    <body>
        <select class="Gy(t)" data-reactid="5">
            <option selected="" value="1506038400" data-reactid="6">Item 0</option>
            <option value="200" data-reactid="7">Item 1</option>
            <option value="123" data-reactid="8">Item 2</option>
            <option value="800" data-reactid="9">Item 3</option>
            <option value="600" data-reactid="10">Item 4</option>
            <option value="240" data-reactid="11">Item 5</option>
            <option value="768" data-reactid="12">Item 6</option>
            <option value="132" data-reactid="13">Item 7</option>
            <option value="632" data-reactid="14">Item 8</option>
            <option value="418" data-reactid="15">Item 9</option>
            <option value="290" data-reactid="16">Item 10</option>
            <option value="748" data-reactid="17">Item 11, 2018</option>
            <option value="154" data-reactid="18">Item 12</option>
            <option value="579" data-reactid="19">Item 13</option>
        </select>
    </body>
</htnl>

A javascript is run in the browser when an option is clicked/selected, and this cause a new page to be loaded.

I want to mimic a user clicking an option, to load a new page.

So this is what I want to do using Scrapy and Splah:

  1. Select the select HTML element (and its option child nodes)
  2. Iterate through each of the options and 'click' them.

This is my code for selecting the select element:

My Code

>>> response.css('select.Gy\(t\)')
[<Selector xpath="descendant-or-self::select[@class and contains(concat(' ', normalize-space(@class), ' '), ' Gy(t) ')]" data='<select class="Gy(t)" data-reactid="5">\n'>]
>>> 

As can be seen, the element is empty, contains no child elements!

What am I doing wrong? How do I select the select element and its children?

Once I have selected the select element I want to iterate over all of its child elements and click them. How do I click (select) an option?


Solution

  • Did you try like this?

    response.css('select option[data-reactid]')
    response.css("select[class=Gy\(t\)] option[data-reactid]")
    

    Either of the two should work.