Search code examples
nosqlupdatesaerospike

Updating Redundant data/denormalized data in NoSQL(Aerospike)


My question is that I am having a problem where I need to update the data which is been denormalized due to being in NoSQL because a single update in one data needs to be updated in all other redundant data.

For eg: Consider an e-commerce database where there is one table "Products" which contains all the details about a product , let's say name,imageName, LogoImage Now in this case the LogoImage of various "Products" table entry can be same, and now I need to update the LogoImage, so I need to update in all the fields which contains the given LogoImage. which seems like a very poor solution

So is there any better way to do that?

P.S.: If we seperate logo and Products into 2 different table , so when I need to get 1000 products at a time , I need to get the related logos by implementing a client level join type thing, which is also not a good solution.


Solution

  • You're suggesting using the database as your CDN and storing the binary image in it? That's not a great approach, in my opinion. You should be storing that image in an actual CDN like Amazon Cloudfront, or a simple one like Amazon S3, or your own webserver as a file. Whichever, the point is that you should be referring to it by URI. In Aerospike you would store the metadata about that image, not the image itself.

    Next, you can have two sets - prod for products and prodimg for product images. The various products store a list of IDs referring to the product image set. The product image set has metadata about each image as a separate record { uri, name, title, width, length, ... } . If anything changes about this image, you just update the one record with the metadata for that image in prodimg. No need to change anything about the products.

    And you don't really need JOIN functionality in this case. Your application can get the prod record first, and use the bin (images) that has all the IDs of the images for the product (each referring to a key of a record in prodimg). You can then issue either a few get operations (reads) or a single batch-read for all of them if there are many. The latencies for Aerospike are such that this will return faster and scale better than an equivalent JOIN in an RDBMS. A batch-read is a multi-node, multi-core, multi-threaded operation. A cluster of 3 multi-core nodes has plenty of parallel computing power.

    Again, if you "need 1000 products at a time" use batch-read. In the Java client that's an AerospikeClient.get() with a list of Key objects. In the Python client that's an aerospike.Client.get_many. Every Aerospike client has batch-read functionality.