I am trying to parse an Input file and generate a .csv
File as output.
So far it is working fine but the writing takes forever. My question is: How can I speed up the writing of the file? Is has about 1.000.000 lines as output.
// generate a really long array to simulate input
const data = []
for (let i = 0; i < 1000000; i++) {
data.push({plz: '12345', strasse: 'Teststrasse'})
}
async function* asyncIterator(data: any): AsyncGenerator<any> {
for (const entry of data) {
yield entry;
}
}
async function writeCSV(data: any[]): Promise<void> {
const file = await Deno.open('./test.csv', {write: true, create: true});
await writeCSVObjects(file, asyncIterator(data), {header: ['plz', 'strasse']});
file.close();
}
I already had a hard time wrapping my head around this asyncIterator
thing.
I would guess what is making this slow, is the constant yielding of the generator function? Is there a way of batch processing that would make this faster
MISC: I am using the standard csv-library and this method from the library.
The links you provided don’t point to Deno’s standard library (/std
). Those links point to some community module (/x
hosts community/third-party code). The standard library includes very performant CSV tools. Here’s a link to a how-to-use example for streaming serialization (I'm inlining it below in case the link ever breaks or points to a different resource in the future):
https://deno.land/std@0.192.0/csv/mod.ts?s=CsvStringifyStream#example_0
import { CsvStringifyStream } from "https://deno.land/std@0.192.0/csv/csv_stringify_stream.ts";
import { readableStreamFromIterable } from "https://deno.land/std@0.192.0/streams/readable_stream_from_iterable.ts";
const file = await Deno.open("data.csv", { create: true, write: true });
const readable = readableStreamFromIterable([
{ id: 1, name: "one" },
{ id: 2, name: "two" },
{ id: 3, name: "three" },
]);
await readable
.pipeThrough(new CsvStringifyStream({ columns: ["id", "name"] }))
.pipeThrough(new TextEncoderStream())
.pipeTo(file.writable);