Search code examples
javadaofault-tolerancehystrixresiliency

Intended granularity of Hystrix commands?


I just read the Hystrix docs/wiki and still am missing something at a fundamental level: what is the intended level of granularity for a HystrixCommand impl?

For instance, say I have a DAO object that handles CRUD operations for some DB entity, say, a Widget:

class Widget {
    Long id
    Long typeId
    Long version
    String name
    Boolean isAlive
}

interface WidgetDao {
    Widget insertWidget(Long typeId, String name, Boolean isAlive)

    List<Widget> getAllWidgets()

    Widget getWidgetById(Long id)

    void updateWidget(Widget widget)

    void deleteWidget(Widget widget)
}

Now, if the database that this DAO connects to goes down, all the DAO methods will begin failing. But I suppose it is also possible for the DB to be tied up in some transaction or maintenance mode, where say, reads are permitted, but not writes. In that edge case, reads would succeed (the getX(...) methods), but all others would fail with SqlExceptions.

So I ask: what's the intended level of granularity that I should be using here? Either:

  1. One HystrixCommand impl for every DAO method, seeing that in some cases the commands could be running successfully, and in others, they could fail; or
  2. One HystrixCommand somehow baked into the DAO class, spanning all DAO methods (ergo if one command fails, the DAO as a whole "goes down".)?

I think the former represents more flexible engineering, but introduces a lot more code to me as a consumer of the library. Thoughts? Ideas?


Solution

  • My thought is that level of granularity is pretty open to interpretation, but I think it all comes down to fault tolerance/recovery and fine tuning. I would consider the following:

    1. What are your points of failure and how do you recover from them? Can you recover?

    You had mentioned:

    I suppose it is also possible for the DB to be tied up in some transaction or maintenance mode, where say, reads are permitted, but not writes

    If this is the case, maybe it makes sense to design your Hystrix commands around this. You could try a more general "read widget" command and a "write widget" command.

    Let's say you are in a situation where reads are working and writes are not, you could maintain reads and break the circuit on the write command, potentially saving you some database connections along the way. You can do the same by increasing granularity and having one command per DAO method, but I'm not sure this really buys you anything.

    1. Do you need/want to finely tune your application?

    Hystrix offers some pretty good configuration for thread pools and metrics that can be tuned on a per-command basis. Does it make sense to configure these as one, grouped by reads and writes, or do you want more finite control/reporting over each DAO method?

    Overall, I think it depends on the situation and I don't think Hystrix was created with any particular level of granularity in mind. From my experience(consuming REST APIs via Hystrix commands), I tend to go more with the first approach and favor granularity. Sure, we generate a bit more code this way, but the consumers of these libraries(in our case) rarely need to deal with it since they just use the interfaces that ultimately call these Hystrix commands, and we get to take advantage of the thread pooling/fallback options. This can come in pretty handy since with REST APIs, its not uncommon for only one endpoint to start failing, so we can fail fast.

    Of course, your use case is a bit different than mine, but I would look at fault tolerance/recovery and go from there.