Search code examples
rowgoogle-cloud-bigtable

Google cloud Bigtable read multiple rows


I have a set of known rowkeys like this

h1-r1-en  
h1-r1-es  
h1-r1-fr  
h1-r1-pt  
h1-r2-en  
h1-r2-es  
h1-r2-fr  
h1-r2-pt 

My question is, should I perform a range scan to retrieve all the rows within range h1-r* or would be better to perform one read query for each rowkey?


Solution

  • Choosing to do a range scan vs a scan on your specific queries really depends on what you have planned for this data. If you have do a range scan, then any other data within that prefix "h1-r*" would get included and not be as performant. But if the ONLY rows starting with "h1-r" are those specific rows, then it would perform the same way and I'd recommend using a prefix scan to simplify your code.

    Here are code snippets for each way and we are working on integrating these into our documentation at the moment, but you can check out more on GitHub.

    Prefix scan:

    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
      Query query = Query.create(tableId).prefix("h1-r");
      ServerStream<Row> rows = dataClient.readRows(query);
      for (Row row : rows) {
        // do something
      }
    } catch (IOException e) {
      System.out.println(
          "Unable to initialize service client, as a network error occurred: \n" + e.toString());
    }
    

    Adding individual rows:

    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
      Query query = Query.create(tableId)
             .rowKey("h1-r1-en")
             .rowKey("h1-r1-es")
             .rowKey("h1-r1-fr"); // Continue adding all your rows this way.
      ServerStream<Row> rows = dataClient.readRows(query);
      for (Row row : rows) {
        // do something
      }
    } catch (IOException e) {
      System.out.println(
          "Unable to initialize service client, as a network error occurred: \n" + e.toString());
    }