I was readingÂ http://thislongrun.blogspot.com/2015/03/dead-nodes-dont-bite.htmlÂ and simply had to respond to some confusion. Since I’m not sure what the comment publishing timeline looks like, I’m copying my reply here.

The intuition you’ve quoted is right and I’m not quite sure why you think it’s wrong.

In particular:

â€œNow let us turn to single node failures. […] You simply failover to a replica inÂ a transactionally consistent way. Notably, at least Tandem and Vertica haveÂ been doing exactly this for years.â€There is nothing more to add. There are real-world systems that are bothÂ available and consistent when there is a node failure.

That’s true *in this failure mode*, but the choice of â€œAvailabilityâ€ in the CAP theorem is much stronger: being available requires a response as long as you have *any* working nodes in the system. The fact that some systems have consistent replica pairs that they can fail between does not make them both Consistent and Available in the CAP sense â€” it just makes them Consistent but with more uptime than the naive system.

Looking atÂ [1], which I assume youâ€™re drawing that quote from, itâ€™s clear that Stonebreaker’s main point is not that CAP is wrong, but thatÂ it encourages the wrong sort of engineering tradeoffs: yes, you might need to choose between C and A eventually, but your systems should be designed to delay the decision as long as possible â€” and in many applications, users wonâ€™t be able to tell the difference. But that doesnâ€™t mean partition-tolerance is not equivalent to fault-tolerance.

In particular, Stonebreaker assumes that any double failures simply wonâ€™t happen so you shouldnâ€™t worry about it in your system design. As a developer on a CP system (Ceph) with users who report issues to a mailing list, I am sad to report that for systems of scale he is wrong on this assumption. 🙁

In other words, in a strongly consistent system, the intuition is right, or, moreÂ precisely, actions are taken to make it right. However, in an AP system theÂ intuition is just wrong.”

In an AP system you are tolerating a partition, by choosing to *not care* whether the nodes are partitioned or not. But a partition and a node failure are still indistinguishable to any monitoring nodes you have.

In a CP system you are choosing to be consistent and partition-tolerant, by guaranteeing that one side will win. But again down versus partitioned is indistinguishable to any monitoring nodes you have.

And that makes a failure and a partition indistinguishable from anybody elseâ€™s point of view.

Indeed, you circle back toÂ this in your penultimate section, so Iâ€™m not sure why your lede and conclusion say otherwise.

Speaking more generally, even if partition tolerance and failure are not equivalent,Â you still have to design your system (in CAPÂ terms) as if theyÂ are, choosing to give up either C or A:

Letâ€™s say you did try and design a â€œCAâ€Â system. Let it consist of a set {N} nodes. Make some writes {W}.Â Let the set {M} nodes go down.Â Make some more writes {X}Â to the remaining {N}-{M} nodes.

Now let the {N}-{M} nodes die, and the {M} nodes turn back on.Â What happens to your system? Do the {M} functioning nodes return data based only on {W}, or do they refuse service? Does this simply count as every node being failedÂ until one of {N}-{M} is back?

How doÂ those answers differ from applying the same scenario as a partition?