Search code examples
javahtmlweb-scrapinghtmlunit

Adding and removing elements with HtmlUnit


Well, I want to scrape a table out of my school website which contains the schedule for the day. The table changes according to what class you chose, so it uses javascript command that checks which class is selected. I found out that it is based on this list in the HTML code of the website:

<option selected="selected" value="52">1</option>
<option value="53">2</option>
<option value="54">3</option>
<option value="1">ז - 1</option>
<option value="2">ז - 2</option>
<option value="3">ז - 3</option>
<option value="4">ז - 4</option>
<option value="5">ז - 5</option>
<option value="6">ז - 6</option>
<option value="57">ז - 7</option>
<option value="9">ח - 1</option>
<option value="10">ח - 2</option>
<option value="11">ח - 3</option>
<option value="12">ח - 4</option>
<option value="13">ח - 5</option>
<option value="14">ח - 6</option>
<option value="15">ח - 7</option>
<option value="17">ט - 1</option>
<option value="18">ט - 2</option>
<option value="19">ט - 3</option>
<option value="20">ט - 4</option>
<option value="21">ט - 5</option>
<option value="22">ט - 6</option>
<option value="23">ט - 7</option>
<option value="26">י - 1</option>
<option value="27">י - 2</option>
<option value="28">י - 3</option>
<option value="29">י - 4</option>
<option value="30">י - 5</option>
<option value="31">י - 6</option>
<option value="32">יא - 1</option>
<option value="33">יא - 2</option>
<option value="34">יא - 3</option>
<option value="35">יא - 4</option>
<option value="36">יא - 5</option>
<option value="37">יא - 6</option>
<option value="38">יב - 1</option>
<option value="39">יב - 2</option>
<option value="40">יב - 3</option>
<option value="41">יב - 4</option>
<option value="42">יב - 5</option>
<option value="43">יב - 6</option>
<option value="56">יב - 7</option>
<option value="49">שכבה ז'</option>
<option value="50">שכבה ח'</option>
<option value="51">שכבה ט'</option>
<option value="48">שכבה י'</option>
<option value="46">שכבה י&quot;א</option>
<option value="47">שכבה י&quot;ב</option>

As you can see, the chosen option has additional element <option selected="selected" value="52">1</option> which called selected Basically, I just want to remove that selected element from one option element and move it to another option element which will be the class that I want to choose.


Solution

  • This 'selected' thing is nothing special; it is the way html works (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/select). And btw, if you like to scrape web pages some basic knowledge of Html might be helpful :-).

    Regarding your select:

    • you have to find the select control inside the page - usually you have a HtmlPage object and you have to use one of the selection methods offered by HtmlUnit(http://htmlunit.sourceforge.net/gettingStarted.html) to find the select element (without knowing your code i can't be more specific)
    • then find the option (HtmlOption) inside the select you like to choose
    • finally call HtmlOption#setSelected(true) for that option