Search code examples
node.jsgoogle-cloud-bigtablebigtable

Range filters on Google BigTable


I am currently testing out BigTable at the moment to see if is something we will use. We currently use CloudSql with Postgres 9.6 with the current schema of;

id, sensor_id, time, value

Most of our queries we query for data between a range, something like this

SELECT
  *
FROM
  readings
WHERE
 sensor_id IN(7297,7298,7299,7300)
 AND time BETWEEN '2018-07-15 00:00:00' AND '2019-07-15 00:00:00'
ORDER BY
 time, sensor_id

Each sensor can have readings every 10mins or so, so that's a fair bit of data. At last check, we have 2 billion records, which is increasing a lot each day.

For BigTable I am importing with a row key of readings#timestamp#sensorId so something like this readings#20180715000000#7297

So far seems so good.

To query a range (using node) I am doing this

const fromDate = '20180715000000'
const toDate = '20190715000000'

const ranges = sensorIds.map(sensorId => {
  return {
    start: `readings#${fromDate}#${sensorId}`,
    end: `readings#${toDate}#${sensorId}`,
  }
});

const results = [];

await table.createReadStream({
  column: {
    cellLimit: 1,
  },
  ranges
})
.on('error', err => {
  console.log(err);
})
.on('data', row => {
  results.push({
    id: row.id,
    data: row.data
  })
})
.on('end', async () => {
  console.log(` ${results.length} Rows`)
})

My understanding of this would be that the results would be similar to the sql query above, but it seems to be returning for all sensor ids across the date range, not by the ones specified within the query.

My questions;

  1. Is this the correct row key that we should be using for this type of querying
  2. If this is correct, can we filter per range? or is there a filter that we have to use to only return the values for the given date range and sensorId range?

thanks in advance for your advice.


Solution

  • The problem is that you are setting up your ranges variable in a wrong way and Big Table is getting lost because of that, try doing the following:

    const fromDate = '20180715000000'
    const toDate = '20190715000000'
    const sensorId = sensorIds[0]
    
    const filter = {
        column: {
           cellLimit: 1,
        },
        value: {
            start: `readings#${fromDate}#${sensorId}`,
            end: `readings#${toDate}#${sensorId}`,
        }
    };
    
    const results = [];
    
    await table.createReadStream({
        filter
    })
    .on('error', err => {
        console.log(err);
    })
    .on('data', row => {
        results.push({
          id: row.id,
          data: row.data
        })
    })
    .on('end', async () => {
        console.log(` ${results.length} Rows`)
    })
    

    **NOTE: I am getting the first position of sensorIds which I assume is a list of all the Ids, but you can select any of them. Also this is all untested but should be a good starting point for you.

    You can find snippets on the usage of the Node.js Client for BigTable on this Github Repo.