Search code examples
google-cloud-storage

List of objects as events from GCS


We have a bucket where there are 40+ million objects. We would like to read each of these objects extract particular data from a column and store them as separate files.

How to get all these object lists and process each of them? Can we get this list of objects in pub/sub


Solution

  • you have not specified many details so please specified the following . 1.in what language . 2.what type of data to extract and where/how do you want to save/store(if/or any specific file format) them.

    meanwhile here is a simple code snippet .

    exports.iterateThroughFiles = functions.https.onRequest(async (req, res) => {
      try {
        const bucketName = 'your-bucket-name';
        const bucket = storage.bucket(bucketName);
    
        const [files] = await bucket.getFiles();
    
        files.forEach((file) => {
          // Perform operations on each file
          console.log(`File name: ${file.name}`);
        });
    
        res.status(200).send('Files iteration completed successfully.');
      } catch (error) {
        console.error('Error iterating through files:', error);
        res.status(500).send('An error occurred while iterating through files.');
      }
    });
    

    if this is it then ok but for any specific approach req. more clarification.

    of course you can use pub/sub but

    if the task is long running then i will suggest you use appengine or compute as the firebase function will not suitable as crossing max time limit