Network Partition Handling
Comments focus on how distributed systems, clusters, and databases like Redis and Cassandra handle network partitions, node failures, split-brain scenarios, and maintain consistency using Raft, Paxos, quorums, and replication.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Partitioning also happens when a node fails. But shouldn't be that much of a problem since they probably use a N=3/W=2 setup.
You can use Paxos. There will be some partitions that can cause an outage, but this is only the case when there is no majority of nodes that can communicate with each other, which in practice means you are exposing yourself to a vanishingly small risk.
Do not forget about the netsplit scenarios. In the case you have two datacenters, and one has network problems, you will end up in a split-brain scenario. Both datacenters may believe they are available or down. A way to solve this, could be to use more boxes and majority voting, with something like apache zookeeper. But none of this comes out of the box.
And then when two of the nodes die at the same time...?
So this isn't a shared-nothing system? How do they handle failing nodes, and prevent data loss?
Tbf isn't that equivalent to a network partition and then rebooting or replacing one node? The network will always go down in every middle point of an operation
That's interesting, I thought a Raft cluster would prevent just that!
How do you do that in a distributed cluster with HA failover using Sentinel, considering that Sentinel is susceptible to partitions and drops?
Excellent!A couple of questions:1) In the case of a network partition, the client that is currently connected to the leader, do they get notified that there's a partition, or that the cluster is not in a healthy situation?2) If a client writes to the partition that will get rolled back, and all their transactions get rolled back after the partition heals, do they get notified that their data was rolled back?
1. Run two beefy nodes and accept dataloss of unreplicated changes during failover.2. Use eventual consistency (not suitable for every workload)