Search code examples
javascriptnode.jstypescriptexport-to-csvdeno

Faster asyncIterator for generating large csv files in deno


Fast way of writing large csv-file in deno?

I am trying to parse an Input file and generate a .csv File as output.

So far it is working fine but the writing takes forever. My question is: How can I speed up the writing of the file? Is has about 1.000.000 lines as output.

minimal example Code of current (slow) implementation


// generate a really long array to simulate input
const data = []

for (let i = 0; i < 1000000; i++) {
  data.push({plz: '12345', strasse: 'Teststrasse'})
}


async function* asyncIterator(data: any): AsyncGenerator<any> {
  for (const entry of data) {
    yield entry;
  }
}

async function writeCSV(data: any[]): Promise<void> {
  const file = await Deno.open('./test.csv', {write: true, create: true});

  await writeCSVObjects(file, asyncIterator(data), {header: ['plz', 'strasse']});
  file.close();
}

I already had a hard time wrapping my head around this asyncIterator thing.

I would guess what is making this slow, is the constant yielding of the generator function? Is there a way of batch processing that would make this faster

MISC: I am using the standard csv-library and this method from the library.


Solution

  • The links you provided don’t point to Deno’s standard library (/std). Those links point to some community module (/x hosts community/third-party code). The standard library includes very performant CSV tools. Here’s a link to a how-to-use example for streaming serialization (I'm inlining it below in case the link ever breaks or points to a different resource in the future):

    https://deno.land/std@0.192.0/csv/mod.ts?s=CsvStringifyStream#example_0

    import { CsvStringifyStream } from "https://deno.land/std@0.192.0/csv/csv_stringify_stream.ts";
    import { readableStreamFromIterable } from "https://deno.land/std@0.192.0/streams/readable_stream_from_iterable.ts";
    
    const file = await Deno.open("data.csv", { create: true, write: true });
    const readable = readableStreamFromIterable([
      { id: 1, name: "one" },
      { id: 2, name: "two" },
      { id: 3, name: "three" },
    ]);
    
    await readable
      .pipeThrough(new CsvStringifyStream({ columns: ["id", "name"] }))
      .pipeThrough(new TextEncoderStream())
      .pipeTo(file.writable);