In the first article, we looked closely at what Jepsen found in MariaDB Galera Cluster v12.1.2. The findings were significant: acknowledged transactions could be lost in certain failure scenarios, while Lost Update and stale reads appeared during normal operation. For teams evaluating clustering technologies, that makes the March 2026 report more than an uncomfortable read. It makes it a useful reference point.
Galera has long been attractive for understandable reasons. It offers a compelling operational story, especially for teams that value open source software, broad community familiarity, and the convenience of a multi primary design. None of that disappears because Jepsen published a harsh report. But the report does force a harder question into the open: what tradeoffs come with Galera’s synchronous, multi primary, certification based design, and are they acceptable for the systems depending on it?
That is where the comparison with Continuent Tungsten Cluster for MySQL becomes useful. Both products are built to keep database backed applications running through failure. Both are meant to reduce operational risk. But they do not approach durability, ordering, and failover in the same way, and those differences matter most precisely when the system is under stress.
Architectural Fundamentals
The Jepsen findings make more sense once you look at what each system is built to do. Galera is designed to make a cluster of peer nodes behave like a write anywhere system. Continuent Tungsten Cluster is built around a clear chain of authority, where write order, read freshness, and promotion decisions are controlled explicitly.
Galera is an optimistic, multi master, certification based cluster. Transactions execute locally on the node that receives them, then at commit the write set is broadcast, globally ordered, and certified for conflicts. MariaDB describes certification based replication as optimistic execution on a single node followed by coordinated certification and global ordering at commit. The appeal is obvious: any node can accept writes, the topology feels symmetrical, and the application does not have to organize itself around a single designated writer.
- Node model: peer nodes can all accept writes
- Write path: transactions execute locally on any node that receives them
- Replication model: write set broadcast and certification, not ordered log shipping from one write authority
- Ordering boundary: established at commit, after local execution
- Read model: peer symmetry encourages the expectation that any node can serve current state
- Failover model: recovery depends on what the cluster can preserve after certification and apply
Jepsen’s findings make sense in light of how Galera handles a transaction. Work can succeed locally before the cluster has fully settled conflicts, final ordering, and durable state. That helps explain Lost Update (MDEV-38977) and Stale Read (MDEV-38999) during normal operation, where different nodes can advance work before certification resolves the final result at commit. The two write loss scenarios (MDEV-38974, MDEV-38976) show the same architectural gap under failure: acknowledgement can get ahead of what the cluster can durably preserve.
For the customer, the trade is straightforward: easy write anywhere operations, with exposure to consistency and durability risk when the system is under stress.
Continuent Tungsten Cluster is a single writer, log based, primary replica cluster. Replicator captures and distributes changes from the Primary, Connector directs traffic based on role and availability, and Manager monitors the dataservice and coordinates failover. The result is a more explicit chain of responsibility across the cluster: writes are directed through a designated Primary, replication flows from that source, and routing and recovery decisions are made around defined roles. That structure makes write authority easier to reason about and does not depend on the same kind of post acknowledgement reconciliation on the write path that appears in a multi writer certification model.
- Node model: explicit Primary and Replica roles
- Write path: writes are routed to a designated Primary
- Replication model: binlog derived, log based replication through THL
- Ordering boundary: write order originates at the write authority, not through later certification across multiple writers
- Read model: freshness is governed by routing policy, QoS, and latency thresholds
- Failover model: promotion is tied to tracked replication state, including checks around unapplied or stored replication events
That architecture is less exposed to the issues Jepsen found in Galera because it reduces ambiguity at the points where Galera is most vulnerable. A designated write authority avoids the multi writer certification window behind Lost Update (MDEV-38977) . Read freshness is governed explicitly through routing policy rather than assumed as a property of cluster symmetry, which makes Stale Read (MDEV-38999) less likely to hide behind a successful commit. And because failover is tied to tracked replication state, promotion is based on what the system can account for, which is a more conservative posture than the one implicated in Jepsen’s two write loss scenarios (MDEV-38974, MDEV-38976).
For the customer, the trade is nearly the inverse of Galera’s: more explicit control over write authority, read freshness, and recovery safety, in exchange for a more structured operating model.
Where Galera Has Practical Advantages
Jepsen’s findings raise serious questions about how Galera behaves under concurrency and failure, but they do not erase the reasons it became widely adopted in the first place. Galera’s design prioritizes flexibility and ease of use, and that has made it attractive to teams that want a clustering solution they can understand quickly, deploy with a simple operational model, and operate without building a large amount of custom routing and failover logic around it.
-
Operational simplicity for small teams
Any node can accept writes, which removes the need for a separate write routing layer and simplifies deployment. For smaller teams, or for organizations without deep in house database specialization, that can make Galera easier to adopt and easier to explain. The topology is straightforward, and the path from evaluation to production is often shorter. -
Open source and zero licensing cost
Galera is free to use, which makes it accessible to cost sensitive environments and lowers the barrier to entry for teams that are not ready to commit to a commercial platform. That matters in practice. Many organizations will accept more architectural risk when the upfront economic case is simple and immediate. -
Broader community familiarity
Galera is widely adopted, well documented, and familiar to a large portion of the MySQL and MariaDB ecosystem. That familiarity reduces friction. It is easier to hire for, easier to support internally, and easier to evaluate because many teams already know roughly what they are getting. -
Schema change tooling ecosystem
Galera also benefits from fitting into an ecosystem many MySQL teams already know how to use. Strong familiarity with tools such as pt-online-schema-change and gh-ost adds practical value for day to day operations, especially in environments where online schema change workflows are already part of established practice.
These advantages come from the same active/active model that distributes writes across nodes. That model is a large part of what makes Galera feel flexible and easy to adopt. It is also the same design choice that Jepsen put under pressure. In other words, Galera’s strengths and its exposed risks are closely related. The convenience is real. So is the tradeoff.
Even with the risks identified in the Jepsen report, Galera remains a reasonable choice for teams that prioritize open source cost, fast adoption, operational simplicity, and broad ecosystem familiarity over stricter guarantees around write durability, read visibility, and ordered execution. For lower risk workloads, internal systems, or environments where the business impact of occasional anomalies is limited, some teams may still find that trade acceptable.
Why Tungsten Cluster for MySQL Stands Out
Continuent Tungsten Cluster for MySQL is designed for organizations that need a database platform to behave predictably when the stakes are high. Its strengths come from a more controlled architecture, one that prioritizes durability, ordered execution, and operational clarity under both normal conditions and failure.
-
Reliable transaction durability
Tungsten is built around a durable, ordered write path, so commit acknowledgement reflects a clearer durability boundary. For teams running critical systems, that matters because it reduces the risk that a successful transaction later turns into a recovery and reconciliation problem. -
Ordered and predictable write execution
Tungsten’s default single writer architecture serializes writes at the source. That gives applications a more predictable execution model and makes the resulting system easier to reason about under concurrency, especially where business logic depends on consistent ordering. -
Controlled read visibility
Tungsten manages read behavior through routing, giving teams explicit control over how reads are served and how freshness is handled. That creates a more transparent operational model, where visibility is engineered deliberately rather than assumed. -
Flexible deployment for HA and DR
Tungsten supports multiple topologies and deployment patterns, including broader HA and disaster recovery designs, without forcing a more permissive write model at the center of the architecture. That means teams can gain deployment flexibility without weakening control over write ordering and recovery behavior. -
Responsive support from experienced engineers
Continuent pairs the product with support from engineers who know the system deeply and can respond quickly when issues arise. For organizations running business critical workloads, that direct access to expertise is not a nice extra. It is part of the value of the platform.
Taken together, these strengths reflect a more disciplined approach to clustering. Tungsten asks teams to be more deliberate about how writes are handled, how reads are routed, and how failover is managed. In return, it offers a platform better suited to environments where the database must remain a dependable source of truth at all times.
Its long production history and clear operational documentation reinforce that story. Tungsten is not presented as an idealized architecture that only looks good in design diagrams. It is a mature platform built for teams that need durability, predictability, and control to hold up in real can’t-fail production conditions.
Summary Comparison Table
| Dimension | Continuent Tungsten Cluster for MySQL | MariaDB Galera Cluster |
|---|---|---|
| Write durability | ✅ Controlled around a Primary and explicit topology | ❌ Jepsen documented loss of acknowledged commits in specific crash and partition scenarios |
| Lost Update (P4) | ✅ Default write model avoids Galera’s multi writer certification window | ❌ Jepsen observed Lost Update in healthy clusters |
| Stale reads | ✅ Read behavior is governed through explicit routing and QoS choices | ❌ Jepsen observed stale reads after commit during normal operation |
| Operational model | ✅ Primary/Replica design that is easier to reason about under pressure | ⚠️ Flexible multi primary design, but with more exposed correctness risk under concurrency and failure |
| Support posture | ✅ 24/7 support, with Continuent publicly citing under 3 minute urgent response times | ⚠️ Support experience varies by deployment model and vendor path |
| Licensing cost | ❌ Commercial subscription | ✅ Open source, no license fee |
| Community familiarity | ⚠️ More specialized ecosystem | ✅ Broad familiarity in MySQL and MariaDB circles |
| Independent Jepsen result | ⚠️ No public Jepsen analysis located | ❌ Public Jepsen analysis documents unresolved durability and consistency anomalies in the tested line |
| Documentation posture | ✅ Clear emphasis on routing, role selection, and operational behavior | ⚠️ Strong consistency language in vendor docs, but Jepsen documented meaningful gaps in practice |
Takeaway
The Jepsen report does more than raise concerns about Galera. It changes how clustering systems should be evaluated in the first place. Before Jepsen, it was easier to treat Galera and Tungsten as different routes to the same goal: keep the database available, preserve data, and survive failure. After Jepsen, that framing is harder to sustain. The report shows that clustering architecture shapes both availability and whether a successful commit can still be trusted when the system is under stress.
That is the real divide between Galera and Tungsten. Galera makes a compelling trade for teams that value a free, open source clustering model, broad ecosystem familiarity, and the convenience of a write anywhere model. For some workloads, that may still be a reasonable choice. But Jepsen makes clear that this convenience comes with a sharper durability and consistency tradeoff than many teams may have assumed.
Tungsten makes a different trade. It gives up the zero license cost of an open source model and the symmetry of multi writer convenience in favor of clearer write authority, more explicit control over read freshness, and failover decisions tied to tracked replication state. That structure is more deliberate by design, and better aligned with environments where the database must remain a dependable source of truth under pressure.
In that sense, Jepsen did not simply expose problems in one product. It clarified what buyers should be asking of any clustering platform: when the system is stressed, where does the risk go? Galera is a strong fit for leaner teams and open source first environments that are comfortable with a wider risk envelope under failure. For business critical systems and larger enterprise deployments where trust in the data matters as much as uptime, Tungsten is the stronger choice.
The value of the Jepsen report is that it turns the choice of clustering architecture from a feature comparison into a risk allocation question.
Comments
Add new comment