Search code examples
xquerymarklogic

What is role of "empty greatest" and "empty least" in Order By


While reading MarkLogic Query Performance and Tuning Guide, I got to know about empty greatest and empty least and how it can be used with order by. However, there is not much detail or examples available for this to understand other than this:

You can specify either empty greatest or empty least, but empties always need to be at the end for the order by optimizations to work. For example, empty greatest is optimized with ascending; empty least is optimized with descending. If neither is specified, MarkLogic chooses the order that is optimized. The following example goes against the default. It orders the list by $doc/document/modified_date, ascending order, with empty least:

xquery version "1.0-ml";
for $doc in fn:doc()
order by $doc/document/modified_date ascending empty least
return $doc

Could any one help me to understand the actual use case of empty greatest and empty least?


Solution

  • In the XQuery Spec https://www.w3.org/TR/xquery-31/#id-order-by-clause it is explained as

    For the purpose of determining their relative position in the ordering sequence, the greater-than relationship between two orderspec values W and V is defined as follows:

    When the orderspec specifies empty least, the following rules are applied in order:

    If V is an empty sequence and W is not an empty sequence, then W greater-than V is true.

    If V is NaN and W is neither NaN nor an empty sequence, then W greater-than V is true.

    If a specific collation C is specified, and V and W are both of type xs:string or are convertible to xs:string by subtype substitution and/or type promotion, then:

    If fn:compare(V, W, C) is less than zero, then W greater-than V is true; otherwise W greater-than V is false.

    If none of the above rules apply, then:

    If W gt V is true, then W greater-than V is true; otherwise W greater-than V is false.

    When the orderspec specifies empty greatest, the following rules are applied in order:

    If W is an empty sequence and V is not an empty sequence, then W greater-than V is true.

    If W is NaN and V is neither NaN nor an empty sequence, then W greater-than V is true.

    If a specific collation C is specified, and V and W are both of type xs:string or are convertible to xs:string by subtype substitution and/or type promotion, then:

    If fn:compare(V, W, C) is less than zero, then W greater-than V is true; otherwise W greater-than V is false.

    If none of the above rules apply, then:

    If W gt V is true, then W greater-than V is true; otherwise W greater-than V is false.

    Basically, if an expression used in an order by clause gives an empty sequence for an item to be ordered, the empty least declares to sort it before items that have a non-empty sort key while empty greatest sort it after those items.

    A simple example:

    The XQuery

    declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
    
    declare option output:method 'xml';
    declare option output:indent 'yes';
    
    for $item in root/item
    order by $item/name, $item/cat empty greatest
    return $item
    

    sorts

    <root>
        <item>
            <name>a</name>
            <cat>z</cat>
        </item>
        <item>
            <name>a</name>
        </item>
        <item>
            <name>a</name>
            <cat>d</cat>
        </item>
    </root>
    

    into

    <item>
       <name>a</name>
       <cat>d</cat>
    </item>
    <item>
       <name>a</name>
       <cat>z</cat>
    </item>
    <item>
       <name>a</name>
    </item>
    

    https://xqueryfiddle.liberty-development.net/eiZQFp9/1