In Xquery 3.1 (in eXist 4.7) I have 40 XML files, and I need to select 4 of them at random. However I would like the four files to be different.
My files are all in the same collection ($data
). I currently count the files, then use a randomising function (util:random($max as xs:integer)) to generate position()
in sequence of files to select four of them:
let $filecount := count($data)
for $cnt in 1 to 4
let $pos := util:random($filecount)
return $data[position()=$pos]
But this often results in the same files being selected multiple times by chance.
Each file has a distinct @xml:id
(in the root node of each file) which can allow me, if possible, to use that as some sort of predicate in recursion. But I'm unable to identify a method for somehow accruing the @xml:id
s into a cumulative, recursive sequence.
Thanks for any help.
I think the standardized random-numer-generator
function and its permute
function (https://www.w3.org/TR/xpath-functions/#func-random-number-generator) should give you better "randomness" and diverse results e.g.
let $file-count := count($data)
return $data[position() = random-number-generator(current-dateTime())?permute(1 to $file-count)[position() le 4]]
I haven't tried that with your db/XQuery implementation and it might be there are also ways with the functions you currently use.
For eXist-db I guess one strategy is to call the random-number
function until you have got a distinct sequence of the wanted number of values, the following returns (at least in some tests with eXide)) four distinct numbers between 1 and 40 on each call:
declare function local:random-sequence($max as xs:integer, $length as xs:integer) as xs:integer+ {
local:random-sequence((), $max, $length)
};
declare function local:random-sequence($seq as xs:integer*, $max as xs:integer, $length as xs:integer) as xs:integer+ {
if (count($seq) = $length and $seq = distinct-values($seq))
then $seq
else local:random-sequence((distinct-values($seq), util:random($max)), $max, $length)
};
let $file-count := 40
return local:random-sequence($file-count, 4)
Integrating that in the previous attempt would result in
let $file-count := count($data)
return $data[position() = local:random-sequence($file-count, 4)]
As for your comment, I didn't notice the exist util:random
function can return 0 and excludes the max value so based on your comment and a further test I guess you rather want the function I posted above to be implemented as
declare function local:random-sequence($seq as xs:integer*, $max as xs:integer, $length as xs:integer) as xs:integer+ {
if (count($seq) = $length)
then $seq
else
let $new-number := util:random($max + 1)
return if ($seq = $new-number or $new-number = 0)
then local:random-sequence($seq, $max, $length)
else local:random-sequence(($seq, $new-number), $max, $length)
};
That way it hopefully now returns $length
distinct values between 1
and the $max
argument.