Search code examples
xproc

Sequence input to <p:filter> in XProc 1.0?


Is <p:filter> in XProc able to accept a sequence of documents as input? When I feed Calabash the following:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step"
    version="1.0">
    <p:input port="source" sequence="true">
        <p:inline>
            <doc>
                <content>Hello world!</content>
            </doc>
        </p:inline>
        <p:inline>
            <doc>
                <content>Goodbye world!</content>
            </doc>
        </p:inline>
    </p:input>
    <p:output port="result" sequence="true"/>
    <p:filter select="//content">
        <p:input port="source" sequence="true"/>
    </p:filter>
</p:declare-step>

it raises the following error:

err:XD0006 : 2 documents appear on the 'source' port. If sequence is not specified, or has the value false, then it is a dynamic error unless exactly one document appears on the declared port.

@sequence is specified, and with the value "true". If I remove the second inline document from the input, the processing runs to completion successfully. And if I leave the two inputs but replace <p:filter> with something else that accepts a sequence, like <p:count>, it also runs to completion successfully.

I’m confused because the error message doesn’t say that <p:filter> cannot accept a sequence; it tells me to specify a sequence, and I’ve done that. And since XPath filtering can be applied to an XPath collection() function, it isn't clear (well, to me) why it shouldn’t be possible, at least in principle, to filter a sequence of documents in XProc.

I’m also not sure how to read the spec, which says about <p:filter> that:

This step behaves just like an p:input with a select expression except that the select expression is computed dynamically.

Since <p:input> can accept a sequence, if <p:filter> is said to behave the same way except for filtering, that would seem to imply that <p:filter> should also be able accept a sequence.

I think the options are:

  1. <p:filter> accepts multiple inputs but I haven’t specified that correctly.
  2. <p:filter> does not accept multiple inputs and either the error message and spec are misleading or I’ve failed to understand them correctly.

I’m happy (well, willing) to plead guilty to user error in either case, but I’d be grateful for clarification.

And yes, I can work around the problem by using <p:wrap-sequence> to form the multiple inputs into a single XML tree, but my question is about how <p:filter> works, and not about how to get a specific outcome result. In my actual code it takes 1.5 seconds to read and pass along my real input documents and 4.5 seconds if I add the step of wrapping them, and I’d like to save the 3 seconds, especially because the wrapping would be an ephemeral work-around, since I’m just going to extract content and wind up with multiple documents after the filtering step anyway.


Solution

  • As per the recommendation of the XProc language, one reads the following step defintion at 7.1.9 p:filter:

    <p:declare-step type="p:filter">
         <p:input port="source"/>
         <p:output port="result" sequence="true"/>
         <p:option name="select" required="true"/>                     <!-- XPathExpression -->
    </p:declare-step>
    

    You can notice notice that the source port is not declared with sequence="true", thus the second option you mentionned above is the right one.

    As as workaround, you can indeed use a <p:wrap>.