Search code examples
xmlxpathvtd-xml

Dynamic lookup in VTD-XML using only XPath


I'm trying to use an XPath expression to find elements referring to the current element in VTD-XML. So say my XML contains books and ratings and looks like this:

<root>
  <book id="1" name="Book1"/>
  <book id="2" name="Book1"/>
  <rating book-id="1" value="5"/>
  <rating book-id="2" value="3"/>
</root>

First I'm iterating over all book elements. Then, for each book, I want to execute an XPath expression that fetches the rating for that book. For example:

/root/rating[@book-id=current()/@id]/@value

This doesn't work as the current() function is exclusive to XSLT. So I tried declaring a variable expression named "current" to "." to mean "the current book", but that doesn't work either because (as the name implies), a variable expression doesn't store the results of the expression, but the expression itself.

Is there a way to achieve this effect in VTD-XML using just an XPath expression? (I realize there's various ways of doing it in code, but I want to use pure XPath so users can easily create a configuration file describing their data format)

EDIT: The upshot of the accepted answer is that what I want can't be done using a single XPath expression. I ended up adding an option so users can essentially specify how the unique identifier for the current book may be found (i.e. "./@id", or maybe "./isbn"). My code then executes this expression and substitutes the result for some placeholder (e.g. "$$") in the rating-search XPath.


Solution

  • A XPath expression like //*/rating[./@book-id=//book/@id]/@value should only retrieve rating values for ratings that match on available book id's.

    If you'd add <rating book-id="3" value="4"/> to the XML document the XPath would return only values 5 and 3 for book 1 and 2 as no book with ID 3 was available.

    A simple test method with VTD could look like this:

    @Test
    public void xpathReference() throws Exception {
        byte[] bytes = ("<root>\n"
                     + "  <book id=\"1\" name=\"Book1\"/>\n"
                     + "  <book id=\"2\" name=\"Book1\"/>\n"
                     + "  <rating book-id=\"1\" value=\"5\"/>\n"
                     + "  <rating book-id=\"2\" value=\"3\"/>\n"
                     + "  <rating book-id=\"3\" value=\"4\"/>\n"
                     + "</root>").getBytes();
    
        VTDGen vtdGenerator = new VTDGen();
        vtdGenerator.setDoc(bytes);
        vtdGenerator.parse(true);
        VTDNav vtdNavigator = vtdGenerator.getNav();
    
        AutoPilot autoPilot = new AutoPilot(vtdNavigator);
        autoPilot.selectXPath("//*/rating[./@book-id=//book/@id]/@value");
        int id;
        int count = 0;
        while ((id = autoPilot.evalXPath()) != -1) {
            String elementName = vtdNavigator.toString(id);
            int text = vtdNavigator.getAttrVal(elementName);
            String txt = text != -1 ? vtdNavigator.toNormalizedString(text) : "";
            System.out.println("Found match at ID " + id + " in field name '" + elementName + "' with value '" + txt + "'");
            count++;
        }
        System.out.println("Total number of matches: " + count);
        assertThat(count, is(equalTo(2)));
    }
    

    On executing this test method you should see an output similar to this one:

    Found match at ID 15 in field name 'value' with value '5'
    Found match at ID 20 in field name 'value' with value '3'
    Total number of matches: 2
    

    As per comment the code above did not extract the data for the currently processed book in an iteration like manner. The code below now tries to achive this:

    @Test
    public void xpathReference() throws Exception {
        byte[] bytes = ("<root>\n"
                        + "  <book id=\"1\" name=\"Book1\"/>\n"
                        + "  <book id=\"2\" name=\"Book2\"/>\n"
                        + "  <book id=\"4\" name=\"Book3\"/>\n"
                        + "  <rating book-id=\"1\" value=\"5\"/>\n"
                        + "  <rating book-id=\"2\" value=\"3\"/>\n"
                        + "  <rating book-id=\"3\" value=\"4\"/>\n"
                        + "</root>").getBytes();
    
        VTDGen vtdGenerator = new VTDGen();
        vtdGenerator.setDoc(bytes);
        vtdGenerator.parse(true);
        VTDNav vtdNavigator = vtdGenerator.getNav();
    
        AutoPilot autoPilot = new AutoPilot(vtdNavigator);
        autoPilot.selectXPath("//book/@id");
        int id;
        int count = 0;
        while ((id = autoPilot.evalXPath()) != -1) {
            String elementName = vtdNavigator.toString(id);
            int bookId_id = vtdNavigator.getAttrVal(elementName);
            String bookId = bookId_id != -1 ? vtdNavigator.toNormalizedString(bookId_id) : "";
    
            AutoPilot xpathBookName = new AutoPilot(vtdNavigator);
            xpathBookName.selectXPath("//book[@id=" + bookId + "]/@name");
            String bookName = xpathBookName.evalXPathToString();
    
            AutoPilot xpathRating = new AutoPilot(vtdNavigator);
            xpathRating.selectXPath("//rating[@book-id=" + bookId + "]/@value");
            String bookRating = xpathRating.evalXPathToString();
    
            if ("".equals(bookRating)) {
                System.out.println("Book " + bookName + " with id " + bookId + " has no rating yet");
            } else {
                System.out.println("Book " + bookName + " with id " + bookId + " has a rating of " + bookRating);
            }
            count++;
        }
        System.out.println("Total number of matches: " + count);
        assertThat(count, is(equalTo(3)));
    }
    

    If you execute the latter code you should see an output like:

    Book Book1 with id 1 has a rating of 5
    Book Book2 with id 2 has a rating of 3
    Book Book3 with id 4 has no rating yet
    Total number of matches: 2
    

    Note that I did update the name of your second book slightly so you can see the differences more easily.


    ... and yes, it is easy to just get the id of the current book in Java code and then construct an XPath expression with that, but as I explained, I want the user to be able to use XPath to define their document format, so I don't want any format-specific stuff in the code

    VTD does support XPath 1.0 only. If you (or your clients) are able to come up with an XPath 1.0 query you should be able to extract the respective values via VTD also. I guess, plain XPath queries are not expressive enough to directly deliver what you need.

    As the example is maybe to simple for the usecase you need, it is hard to give any recommendations on how to design your application to handle such scenarios. Maybe update your question with more detailed examples. One simple way you could handle this is to introduce placeholder variables which have to be defined separately and then on hitting such placeholders while trying to execute such XPath expression is to simply replace these placeholders with the concrete values of previously extracted values.