Search code examples
node.jstypescriptparquet

How to read several parquet files with Type Script?


I have a folder with parquet files.

How to read them all and convert into 1 big txt file?

I am using parquetjs library to read 1 file:

(
    async () => {
        // create new ParquetReader that reads from 'fruits.parquet`
        let reader = await parquet.ParquetReader.openFile('fruits.parquet');

        // create a new cursor
        let cursor = reader.getCursor();

        // read all records from the file and print them
        let record = null;
        while (record = await cursor.next()) {
            console.log(record);
        }

    }

) ();

Need help with reading several files at once and combining them..


Solution

    1. Convert the aynsc function to take a filename parameter. Make the function return the record
    2. Create an array of filename
    3. Use Array.map to transform the filename array into a Promise array
    4. Use Promise.all to wait for all files to be read
    5. Use String.join to combine all the records into a one string

    Convert the async function to take a filename

    Convert the async file to take a filename parameter

    const readFile = async(filename) => {
      let reader = await parquet.ParquetReader.openFile(filename);
      let cursor = reader.getCursor();
    
      let record = '';
      let currentContent = '';
      while (currentContent = await cursor.next()) {
        record += currentContent;
      }
    
      return record;
    };
    

    Read and combine all files

    const filenames = ['f1.parquet', 'f2.parquet', 'f3.parquet'];
    const readPromises = filenames.map(f => readFile(f));
    const allPromises = Promise.all(readPromises);
    
    // Read and combine
    allPromises.then(contentsArray => contentsArray.join('\n'))
      .then(joinedContent => console.log(joinedContent))
      .catch(console.error);