I am trying to write an XQuery-result to a CSV-file, see attached code (resulting in at least 1.6 millions lines, will problably become a lot more..). However several minutes into execution the program fails with an 'out of main memory' error. I am using a laptop with 4GB of memory. I would have thought that writing to file would prevent memory bottlenecks. Also, I am already using the copynode-false pragma.
I might have gone about the code the wrong way, since this is my first XQuery/BaseX-program. Or this might be non-solvable without extra hardware.. (current Database-SIZE: 3092 MB; NODES: 142477344) Any assistance would be much appreciated!
let $params :=
<output:serialization-parameters xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization">
<output:method value="csv"/>
<output:csv value="header=yes, separator=semicolon"/>
</output:serialization-parameters>
return file:write(
'/tmp/output.csv',
(# db:copynode false #){<csv>{
for $stand in //*:stand
return <record>{$stand//*:kenmerk}</record>
(: {$stand//*:identificatieVanVerblijfsobject}
{$stand//*:inOnderzoek}
{$stand//*:documentdatum}
{$stand//*:documentnummer} :)
}</csv>},
$params
)
It’s a good idea to use the copynode
pragma to save memory. In the given case, it’s probably the total amount of newly created element nodes that will simply consume too much memory before the data can be written to disk.
If you have large data sets, the xquery
serialization format may be the better choice. Maps and arrays consume less memory than XML nodes:
let $params := map {
'format': 'xquery',
'header': true(),
'separator': 'semicolon'
}
let $data := map {
'names': [
'kenmerk', 'inOnderzoek'
],
'records': (
for $stand in //*:stand
return [
string($stand//*:kenmerk),
string($stand//*:inOnderzoek)
]
)
}
return file:write-text(
'/tmp/output.csv',
csv:serialize($data, $params)
)
Another approach is to use the window
clause and write the results in chunks:
for tumbling window $stands in //*:stand
start at $s when true()
end at $e when $e - $s eq 100000
let $first := $s = 1
let $path := '/tmp/output.csv'
let $csv := <csv>{
for $stand in $stands
return <record>{
$stand//*:kenmerk,
$stand//*:inOnderzoek
}</record>
}</csv>
let $params := map {
'method': 'csv',
'csv': map {
'separator': 'semicolon',
'header': $first
}
}
return if ($first) then (
file:write($path, $csv, $params)
) else (
file:append($path, $csv, $params)
)
After the first write operation, subsequent table rows will be appended to the original file. The chunk size (here: 100000 rows per loop) can be freely adjusted. Similar as in your original code, the serialization parameters can also be specified as XML; and it’s of course also possible to use the xquery
serialization format in the second example.