Search code examples
node.jsazureazure-blob-storageazure-iot-hubazure-node-sdk

Is it possible to add filter to azure blobs


I am trying to retrieve blob on the basis of my filters for that I have created a device in iot-hub which is receiving telemetry data and routed it to the storage account as a blob. Now I want to retrieve the blob using Nodejs.

Is there any possibility where I can write an API which filters out me blobs on the basis of the filters without traversing the whole container of blobs?


Solution

  • By default, Azure storage routing creates the blobs with the convention {iothub}/{partition}/{YYYY}/{MM}/{DD}/{HH}/{mm} inside the selected container. So, you have a predictable blob prefix which can be used in the query filter (more on that later). One thing to note here {partition} is the zero-indexed partition id of the partition message is ingested. For example, if you have chosen 4 partitions (default) while creating the IoT hub instance, partition IDs would be 0, 1, 2 and 3.

    Now coming to the querying by filter part. Generally you would most likely want to list blobs (and further read the content) based on a time range as that is pretty much practical on your cold path analytics. Unfortunately, you won't be able to filter blobs by device id as same blob might contain messages from multiple devices. So with the assumption that your cold path analytics is going to process batch (most probably some continuous job) with a sliding time range, below is a sample query (over-simplified of course, read the inline comments carefully) using @azure/storage-blob package (v12 JavaScript SDK). You should check API reference for the improvisation need.

    const blobServiceClient = BlobServiceClient.fromConnectionString('myStorageConnectionString');
    const containerClient = blobServiceClient.getContainerClient('myContainer');
    
    // Add logic here to select time range. 
    // For simplicity I am selecting a hardcoded time range of 2020-1-1 5:45 pm to 2020-1-1 5:46 pm 
    // (just 1 minute range)
    
    // assuming IoT hub has 4 partitions
    for (partition = 0; partition < 4; partition++) {
      // note below the prefix is picking up all blobs written at 17:45:00 to 17:45:59.xxx
      let iter = containerClient.listBlobsByHierarchy("/", { prefix: `myIotHub/${partition}/2020/01/01/17/45` });
      let entity = await iter.next();
      while (!entity.done) {
        let item = entity.value;
        if (item.kind === "prefix") {
          console.log(`\tBlobPrefix: ${item.name}`);
        } else {
          // At this point you might want to to read the blob content here. For simplicity I am just printing few blob properties below
          console.log(`\tBlobItem: name - ${item.name}, last modified - ${item.properties.lastModified}`);
        }
        entity = await iter.next();
      }
    }