Continuent Blog: Clustering Tradeoffs in Practice: Galera, Tungsten, and the Jepsen Findings

Blog

In the first article, we looked closely at what Jepsen found in MariaDB Galera Cluster v12.1.2. The findings were significant: acknowledged transactions could be lost in certain failure scenarios, while Lost Update and stale reads appeared during normal operation. For teams evaluating clustering technologies, that makes the March 2026 report more than an uncomfortable read. It makes it a useful reference point.

Galera has long been attractive for understandable reasons. It offers a compelling operational story, especially for teams that value open source software, broad community familiarity, and the convenience of a multi primary design. None of that disappears because Jepsen published a harsh report. But the report does force a harder question into the open: what tradeoffs come with Galera’s synchronous, multi primary, certification based design, and are they acceptable for the systems depending on it?

That is where the comparison with Continuent Tungsten Cluster for MySQL becomes useful. Both products are built to keep database backed applications running through failure. Both are meant to reduce operational risk. But they do not approach durability, ordering, and failover in the same way, and those differences matter most precisely when the system is under stress.

Architectural Fundamentals

The Jepsen findings make more sense once you look at what each system is built to do. Galera is designed to make a cluster of peer nodes behave like a write anywhere system. Continuent Tungsten Cluster is built around a clear chain of authority, where write order, read freshness, and promotion decisions are controlled explicitly.

Galera is an optimistic, multi master, certification based cluster. Transactions execute locally on the node that receives them, then at commit the write set is broadcast, globally ordered, and certified for conflicts. MariaDB describes certification based replication as optimistic execution on a single node followed by coordinated certification and global ordering at commit. The appeal is obvious: any node can accept writes, the topology feels symmetrical, and the application does not have to organize itself around a single designated writer.

Node model: peer nodes can all accept writes
Write path: transactions execute locally on any node that receives them
Replication model: write set broadcast and certification, not ordered log shipping from one write authority
Ordering boundary: established at commit, after local execution
Read model: peer symmetry encourages the expectation that any node can serve current state
Failover model: recovery depends on what the cluster can preserve after certification and apply

Jepsen’s findings make sense in light of how Galera handles a transaction. Work can succeed locally before the cluster has fully settled conflicts, final ordering, and durable state. That helps explain Lost Update (MDEV-38977) and Stale Read (MDEV-38999) during normal operation, where different nodes can advance work before certification resolves the final result at commit. The two write loss scenarios (MDEV-38974, MDEV-38976) show the same architectural gap under failure: acknowledgement can get ahead of what the cluster can durably preserve.

For the customer, the trade is straightforward: easy write anywhere operations, with exposure to consistency and durability risk when the system is under stress.

Continuent Tungsten Cluster is a single writer, log based, primary replica cluster. Replicator captures and distributes changes from the Primary, Connector directs traffic based on role and availability, and Manager monitors the dataservice and coordinates failover. The result is a more explicit chain of responsibility across the cluster: writes are directed through a designated Primary, replication flows from that source, and routing and recovery decisions are made around defined roles. That structure makes write authority easier to reason about and does not depend on the same kind of post acknowledgement reconciliation on the write path that appears in a multi writer certification model.

Node model: explicit Primary and Replica roles
Write path: writes are routed to a designated Primary
Replication model: binlog derived, log based replication through THL
Ordering boundary: write order originates at the write authority, not through later certification across multiple writers
Read model: freshness is governed by routing policy, QoS, and latency thresholds
Failover model: promotion is tied to tracked replication state, including checks around unapplied or stored replication events

That architecture is less exposed to the issues Jepsen found in Galera because it reduces ambiguity at the points where Galera is most vulnerable. A designated write authority avoids the multi writer certification window behind Lost Update (MDEV-38977) . Read freshness is governed explicitly through routing policy rather than assumed as a property of cluster symmetry, which makes Stale Read (MDEV-38999) less likely to hide behind a successful commit. And because failover is tied to tracked replication state, promotion is based on what the system can account for, which is a more conservative posture than the one implicated in Jepsen’s two write loss scenarios (MDEV-38974, MDEV-38976).

For the customer, the trade is nearly the inverse of Galera’s: more explicit control over write authority, read freshness, and recovery safety, in exchange for a more structured operating model.

Where Galera Has Practical Advantages

Jepsen’s findings raise serious questions about how Galera behaves under concurrency and failure, but they do not erase the reasons it became widely adopted in the first place. Galera’s design prioritizes flexibility and ease of use, and that has made it attractive to teams that want a clustering solution they can understand quickly, deploy with a simple operational model, and operate without building a large amount of custom routing and failover logic around it.

Operational simplicity for small teams
Any node can accept writes, which removes the need for a separate write routing layer and simplifies deployment. For smaller teams, or for organizations without deep in house database specialization, that can make Galera easier to adopt and easier to explain. The topology is straightforward, and the path from evaluation to production is often shorter.
Open source and zero licensing cost
Galera is free to use, which makes it accessible to cost sensitive environments and lowers the barrier to entry for teams that are not ready to commit to a commercial platform. That matters in practice. Many organizations will accept more architectural risk when the upfront economic case is simple and immediate.
Broader community familiarity
Galera is widely adopted, well documented, and familiar to a large portion of the MySQL and MariaDB ecosystem. That familiarity reduces friction. It is easier to hire for, easier to support internally, and easier to evaluate because many teams already know roughly what they are getting.
Schema change tooling ecosystem
Galera also benefits from fitting into an ecosystem many MySQL teams already know how to use. Strong familiarity with tools such as pt-online-schema-change and gh-ost adds practical value for day to day operations, especially in environments where online schema change workflows are already part of established practice.

These advantages come from the same active/active model that distributes writes across nodes. That model is a large part of what makes Galera feel flexible and easy to adopt. It is also the same design choice that Jepsen put under pressure. In other words, Galera’s strengths and its exposed risks are closely related. The convenience is real. So is the tradeoff.

Even with the risks identified in the Jepsen report, Galera remains a reasonable choice for teams that prioritize open source cost, fast adoption, operational simplicity, and broad ecosystem familiarity over stricter guarantees around write durability, read visibility, and ordered execution. For lower risk workloads, internal systems, or environments where the business impact of occasional anomalies is limited, some teams may still find that trade acceptable.

Why Tungsten Cluster for MySQL Stands Out

Continuent Tungsten Cluster for MySQL is designed for organizations that need a database platform to behave predictably when the stakes are high. Its strengths come from a more controlled architecture, one that prioritizes durability, ordered execution, and operational clarity under both normal conditions and failure.

Reliable transaction durability
Tungsten is built around a durable, ordered write path, so commit acknowledgement reflects a clearer durability boundary. For teams running critical systems, that matters because it reduces the risk that a successful transaction later turns into a recovery and reconciliation problem.
Ordered and predictable write execution
Tungsten’s default single writer architecture serializes writes at the source. That gives applications a more predictable execution model and makes the resulting system easier to reason about under concurrency, especially where business logic depends on consistent ordering.
Controlled read visibility
Tungsten manages read behavior through routing, giving teams explicit control over how reads are served and how freshness is handled. That creates a more transparent operational model, where visibility is engineered deliberately rather than assumed.
Flexible deployment for HA and DR
Tungsten supports multiple topologies and deployment patterns, including broader HA and disaster recovery designs, without forcing a more permissive write model at the center of the architecture. That means teams can gain deployment flexibility without weakening control over write ordering and recovery behavior.
Responsive support from experienced engineers
Continuent pairs the product with support from engineers who know the system deeply and can respond quickly when issues arise. For organizations running business critical workloads, that direct access to expertise is not a nice extra. It is part of the value of the platform.

Taken together, these strengths reflect a more disciplined approach to clustering. Tungsten asks teams to be more deliberate about how writes are handled, how reads are routed, and how failover is managed. In return, it offers a platform better suited to environments where the database must remain a dependable source of truth at all times.

Its long production history and clear operational documentation reinforce that story. Tungsten is not presented as an idealized architecture that only looks good in design diagrams. It is a mature platform built for teams that need durability, predictability, and control to hold up in real can’t-fail production conditions.

Summary Comparison Table

Dimension	Continuent Tungsten Cluster for MySQL	MariaDB Galera Cluster
Write durability	✅ Controlled around a Primary and explicit topology	❌ Jepsen documented loss of acknowledged commits in specific crash and partition scenarios
Lost Update (P4)	✅ Default write model avoids Galera’s multi writer certification window	❌ Jepsen observed Lost Update in healthy clusters
Stale reads	✅ Read behavior is governed through explicit routing and QoS choices	❌ Jepsen observed stale reads after commit during normal operation
Operational model	✅ Primary/Replica design that is easier to reason about under pressure	⚠️ Flexible multi primary design, but with more exposed correctness risk under concurrency and failure
Support posture	✅ 24/7 support, with Continuent publicly citing under 3 minute urgent response times	⚠️ Support experience varies by deployment model and vendor path
Licensing cost	❌ Commercial subscription	✅ Open source, no license fee
Community familiarity	⚠️ More specialized ecosystem	✅ Broad familiarity in MySQL and MariaDB circles
Independent Jepsen result	⚠️ No public Jepsen analysis located	❌ Public Jepsen analysis documents unresolved durability and consistency anomalies in the tested line
Documentation posture	✅ Clear emphasis on routing, role selection, and operational behavior	⚠️ Strong consistency language in vendor docs, but Jepsen documented meaningful gaps in practice

Note

One caveat is worth stating directly. A public Jepsen report can demonstrate the presence of anomalies. It cannot prove their absence elsewhere. So this is not an argument that Tungsten has been publicly certified as flawless while Galera has somehow failed an exam. The narrower point is the important one: Galera now has a public, independent record of these specific problems, while Tungsten is built on a materially different architecture designed to avoid the same class of exposure.

Takeaway

The Jepsen report does more than raise concerns about Galera. It changes how clustering systems should be evaluated in the first place. Before Jepsen, it was easier to treat Galera and Tungsten as different routes to the same goal: keep the database available, preserve data, and survive failure. After Jepsen, that framing is harder to sustain. The report shows that clustering architecture shapes both availability and whether a successful commit can still be trusted when the system is under stress.

That is the real divide between Galera and Tungsten. Galera makes a compelling trade for teams that value a free, open source clustering model, broad ecosystem familiarity, and the convenience of a write anywhere model. For some workloads, that may still be a reasonable choice. But Jepsen makes clear that this convenience comes with a sharper durability and consistency tradeoff than many teams may have assumed.

Tungsten makes a different trade. It gives up the zero license cost of an open source model and the symmetry of multi writer convenience in favor of clearer write authority, more explicit control over read freshness, and failover decisions tied to tracked replication state. That structure is more deliberate by design, and better aligned with environments where the database must remain a dependable source of truth under pressure.

In that sense, Jepsen did not simply expose problems in one product. It clarified what buyers should be asking of any clustering platform: when the system is stressed, where does the risk go? Galera is a strong fit for leaner teams and open source first environments that are comfortable with a wider risk envelope under failure. For business critical systems and larger enterprise deployments where trust in the data matters as much as uptime, Tungsten is the stronger choice.

The value of the Jepsen report is that it turns the choice of clustering architecture from a feature comparison into a risk allocation question.

Published In

Series:

Competitor Comparisons

Tags:

Galera

Author

Nia Teerikorpi

COO

Nia, PhD, leverages her extensive expertise in data management to optimize operational efficiency. She brings over 10 years of experience in academia to her role, where she is translating her strengths in project management into the software industry. Her passion for technology and knowledge building enables her to empower Continuent to deliver robust, high-availability database solutions to clients worldwide.

Prior to working at Continuent, Nia has conducted significant research into Autism and congenital heart disease, offering valuable insights aimed at enhancing understanding and treatment options for these conditions. Her commitment to intellectual development reflects her desire to make a meaningful impact in both healthcare and technology.

View All Nia’s Posts

Clustering Tradeoffs in Practice: Galera, Tungsten, and the Jepsen Findings