Network Partition Handling

Comments focus on how distributed systems, clusters, and databases like Redis and Cassandra handle network partitions, node failures, split-brain scenarios, and maintain consistency using Raft, Paxos, quorums, and replication.

📉 Falling 0.5x Databases

3,473

Comments

Years Active

Top Authors

#8534

Topic ID

Activity Over Time

2008

2009

2010

110

2011

110

2012

131

2013

186

2014

250

2015

324

2016

265

2017

292

2018

221

2019

189

2020

224

2021

262

2022

216

2023

221

2024

236

2025

189

2026

Top Contributors

antirez (39) hinkley (24) toast0 (22) rdtsc (22) zzzcpan (19)

Keywords

ZK TFA HA US CAP e.g MTTF OK zawodny.com SWIM nodes partition network cluster clusters node servers redis datacenters relay

Sample Comments

sl_ • May 4, 2010 • View on HN

Partitioning also happens when a node fails. But shouldn't be that much of a problem since they probably use a N=3/W=2 setup.

DanWaterworth • Jul 26, 2012 • View on HN

You can use Paxos. There will be some partitions that can cause an outage, but this is only the case when there is no majority of nodes that can communicate with each other, which in practice means you are exposing yourself to a vanishingly small risk.

jjoergensen • Oct 25, 2012 • View on HN

Do not forget about the netsplit scenarios. In the case you have two datacenters, and one has network problems, you will end up in a split-brain scenario. Both datacenters may believe they are available or down. A way to solve this, could be to use more boxes and majority voting, with something like apache zookeeper. But none of this comes out of the box.

cjg • Mar 29, 2022 • View on HN

And then when two of the nodes die at the same time...?

oomkiller • Nov 10, 2009 • View on HN

So this isn't a shared-nothing system? How do they handle failing nodes, and prevent data loss?

01HNNWZ0MV43FF • May 1, 2024 • View on HN

Tbf isn't that equivalent to a network partition and then rebooting or replacing one node? The network will always go down in every middle point of an operation

Alifatisk • May 7, 2023 • View on HN

That's interesting, I thought a Raft cluster would prevent just that!

atombender • Feb 28, 2019 • View on HN

How do you do that in a distributed cluster with HA failover using Sentinel, considering that Sentinel is susceptible to partitions and drops?

purpleblue • Aug 16, 2022 • View on HN

Excellent!A couple of questions:1) In the case of a network partition, the client that is currently connected to the leader, do they get notified that there's a partition, or that the cluster is not in a healthy situation?2) If a client writes to the partition that will get rolled back, and all their transactions get rolled back after the partition heals, do they get notified that their data was rolled back?

imtringued • Jun 18, 2018 • View on HN

1. Run two beefy nodes and accept dataloss of unreplicated changes during failover.2. Use eventual consistency (not suitable for every workload)