Are Zookeeper ephemeral nodes written to disk?
I know normal Zookeeper nodes are written to disk before Zookeeper acks the write to the client.
However, ephemeral nodes only last for the duration of the client session, so if the zookeeper nodes have all crashed, then by definition the client session is broken. So there would be no need to write to disk, because the ephemeral nodes are not recreated when the ensemble restarts. So theoretically it seems like ephemeral nodes only need to be stored in memory.
Is this how its implemented?
I ran into this question myself, and noticed that it had been answered on the Zookeeper mailing list, and I'm posting it here for anyone who finds this question.
In short, yes, ephemeral nodes are indeed written to disk. As a result, a client session can persist even if the entire Zookeeper ensemble is down. To quote Patrick Hunt's answer from the mailing list (emphasis mine):
Ephemeral znodes are treated just like persistent znodes in the sense that a quorum of nodes need to agree to any change. As such the znode is written to the transaction log.
A client session ends either when a client closes it's session explicitly or the ZK quorum leader decides that the session has expired (which is based on the negotiated session timeout). Only while a leader is active can a session be expired (or closed for that matter). When you shutdown an ensemble the sessions are maintained. If you were to, for example, shut down an ensemble for an hour and then restart it the sessions would still be active. The clock would "reset" when the new leader was elected. If the client session is still active the session would continue, any ephemeral znodes would still exist.