Search code examples
e-commerceatomicdistributed-computing

How do ecommerce like Amazon/Apple maintain real-time aggregate counters during peak times?


A company like Amazon and Apple has a few days where they have heavily online traffic. For Apple, it may be during a new iPhone release while for Amazon, it's probably during Black Friday/Cyber Monday. Handling the server load itself is a problem that requires high scaling amongst other things, but how do these companies manage inventory of the stock when there are so many people buying the same things?

The aggregate counters could be the inventory available in itself, or it could simply be coupons that these e-commerce platforms provide to the first 100 customers that shopped on a particular day.

Essentially, I was wondering how people handle a simple scenario of: Give the first 10000 members unique coupons that they can use over the weekend. The first problem with this approach is identifying the first 100000 members in real-time because you want to inform them that they have won a coupon. The second challenge I see here is giving one specific coupon only to one single member.

The first one seems to be a problem of maintaining an atomic counter. Do e-commerce websites use things like ZooKeeper to implement this kind of counters?

The second one seems to be a question of a distributed queue, where you have to match each of the 10000 customers with the 10000 coupons you have available. Even if FCFS isn't required, you would need to allocate a particular coupon to only a single member in real-time in a high traffic environment where your first 10000 customers could have completed transactions within the first 1 minute of sales.

Do huge e-commerce platforms solve these problems in real-time of do they defer these decisions to happen asynchronously with a suitable time delay?


Solution

  • At high scale, you simplify whatever you can.

    Do you have to issue exactly N coupons? No. You may poll individual servers on how many coupons they have issued in their local DBs. When the sum is over the threshold, you signal them to stop.

    As for the coupon codes, you don't need a central server to check if any particular code is issued. You may store them locally on nodes or even generate on the fly. Use the fact that coupons need not to be used right away so you have time to gather them in one place should you need to.

    If you absolutely need to issue the exact count, you store the coupons on nodes and accept the fact that one node can deplete before others, it would be a matter of seconds difference. And if you cannot accept that, use a server that allocates coupons to nodes in batches as they near depletion. This way the imbalance would not exceed the batch size.

    You may push things even further at exponentially higher price.

    As an intro to the high-scale world, see "Site Reliability Engineering", a free book by Google engineers on how they handle things: https://sre.google/books/