Search code examples
gsutilgoogle-bucket

Google Buckets / Read by line


I know that is currently possible to download objects by byte range in Google Cloud Storage buckets.

const options = {
  destination: destFileName, 
  start: startByte,
  end: endByte,
};

await storage.bucket(bucketName).file(fileName).download(options);

However, I would need to read by line as the files I deal with are *.csv:

await storage
  .bucket(bucketName)
  .file(fileName)
  .download({ destination: '', lineStart: number, lineEnd: number });

I couldn't find any API for it, could anyone advise on how to achieve the desired behaviour?


Solution

  • You could not read a file line by line directly from Cloud Storage, as it stores them as objects , as shown on this answer:

    The string you read from Google Storage is a string representation of a multipart form. It contains not only the uploaded file contents but also some metadata.

    To read the file line by line as desired, I suggest loading it onto a variable and then parse the variable as needed. You could use the sample code provided on this answer:

    const { Storage } = require("@google-cloud/storage");
    const storage = new Storage();
    
    //Read file from Storage
    var downloadedFile = storage
      .bucket(bucketName)
      .file(fileName)
      .createReadStream();
    
    // Concat Data
    let fileBuffer = "";
    downloadedFile
      .on("data", function (data) {
        fileBuffer += data;
      })
      .on("end", function () {
        // CSV file data
        //console.log(fileBuffer);
    
        //Parse data using new line character as delimiter
        var rows;
        Papa.parse(fileBuffer, {
          header: false,
          delimiter: "\n",
          complete: function (results) {
            // Shows the parsed data on console
            console.log("Finished:", results.data);
            rows = results.data;
          },
        });
    

    To parse the data, you could use a library like PapaParse as shown on this tutorial.