Search code examples
javajspservletshtml-content-extraction

how to extract the list of values from a specific drop down box in a web form, using java/jsp


I want to extract all the values for a specific drop down list box in a web form.

In the source code of this web form, the relevant code for this particular drop down, is given below--

<div align="left"><select name="CATEGORY_ID">
<option label="[Top]" value="0" selected="selected">[Top]</option>
<option label="|___Arts &amp; Humanities" value="1">|___Arts &amp; Humanities</option>
<option label="|&nbsp;&nbsp;&nbsp;|___Art History" value="2">|&nbsp;&nbsp;&nbsp;|___Art History</option>
----many more values----
<option label="|&nbsp;&nbsp;&nbsp;|___Work" value="453">|&nbsp;&nbsp;nbsp;|___Work</option>

</select>
</div>

I want to extract both the actual values (ie option ... value="" ) as well as the value shown on screen (ie option label="" )...Can this be done in JSP/Java? And ideally done using only classes supported by Google App Engine? (Even if you can suggest a way to do this but are not sure if that way is supported by Google App Engine for Java, even then kindly suggest your method...)


Solution

  • Easiest is to use a HTML parser for this. I don't think that GAE ships with any one. But you should be able to drop one in your /WEB-INF/lib. I'd suggest to grab Jsoup for this job. You should then able to obtain all options of a <select name="CATEGORY_ID"> of the external website as follows in a servlet:

    Map<String, String> options = new LinkedHashMap<String, String>();
    Document document = Jsoup.connect("http://other.com/some.html").get();
    
    for (Element option : document.select("select[name=CATEGORY_ID] options")) {
        options.put(option.attr("value"), option.text());
    }
    
    request.setAttribute("options", options);
    request.getRequestDispatcher("/WEB-INF/some.jsp").forward(request, response);
    

    And then in JSP redisplay it as follows:

    <select name="category">
        <c:forEach items="${options}" var="option">
            <option value="${option.key}"><c:out value="${option.value}" /></option>
        </c:forEach>
    </select>
    

    The label attribute is by the way unnecessary as it's a silly MSIE-proprietary attribute. It's the body of the <option> element which is to be shown as visible label.