Search code examples
google-cloud-functionsgoogle-cloud-bigtable

Reading Cloud Bigtable from Cloud Functions using Node.js takes > 1500ms


I am trying to read a Cloud Bigtable key via Google Cloud Functions using Node.JS and I am able to read it, but the Cloud Function execution time is more than 1500ms.

I've heard that Cloud Bigtable is very fast in terms of data retrieval, but it's not happening in this case.

Can someone help me out with that i am doing wrong over here?

I tried to load Bigtable library and object globally:

/**
 * Responds to any HTTP request.
 *
 * @param {!express:Request} req HTTP request context.
 * @param {!express:Response} res HTTP response context.
 */

// Imports the Google Cloud client library
const Bigtable = require('@google-cloud/bigtable');

const TABLE_ID = '';
const COLUMN_FAMILY_ID = '';
const COLUMN_QUALIFIER = '';
const INSTANCE_ID = '';

// Creates a Bigtable client
const bigtable = new Bigtable();

// Connect to an existing instance:my-bigtable-instance
const instance = bigtable.instance(INSTANCE_ID);

// Connect to an existing table:my-table
const table = instance.table(TABLE_ID);

const filter = [{
  family: COLUMN_FAMILY_ID,
}, {
  column: COLUMN_QUALIFIER
}];

exports.helloWorld = (req, res) => {

    console.log("started");

    (async () => {
        try {

          var query_params = req.query;
          var rowkey = query_params.key;

          console.log("before query");

          const [singleRow] = await table.row(rowkey).get({filter});

          console.log("after query");

          res.status(200).send();

        } catch (err) {
            // Handle error performing the read operation
            console.error(`Error reading rows :`, err);
        }
    })();

};

I've put console logs at various points and the log time of before query and after query has a gap of around 1500ms.


Solution

  • As per the documentation:

    To get good performance from Cloud Bigtable, it's essential to design a schema that makes it possible to distribute reads and writes evenly across each table.

    Meaning, Bigtable performance heavily depends on schema design among others, such as workload, cells per row, nodes per cluster, disks, etc. Not only the environment it's accessed from (using your code I accessed my sample Bigtable tables in around 750 ms from GCF and 4000 ms from the Shell).

    Also, if you wish to properly test Bigtable performance, it is recommended to be done under the right circumstances:

    1. Use a production instance. A development instance will not give you an accurate sense of how a production instance performs under load.

    2. Use at least 300 GB of data. Cloud Bigtable performs best with 1 TB or more of data. However, 300 GB of data is enough to provide reasonable results in a performance test on a 3-node cluster. On larger clusters, use at least 100 GB of data per node.

    3. Stay below the recommended storage utilization per node. For details, see Storage utilization per node.

    4. Before you test, run a heavy pre-test for several minutes. This step gives Cloud Bigtable a chance to balance data across your nodes based on the access patterns it observes.

    5. Run your test for at least 10 minutes. This step lets Cloud Bigtable further optimize your data, and it helps ensure that you will test reads from disk as well as cached reads from memory.