Downtime with Coffee, not MySQL

We like to think of downtime with a nice cup of coffee, or like my CEO would prefer, an excellent glass of Cabernet Sauvignon - not a hectic event dealing with a failed database. That’s if you’re in Operations or an Architect working on SaaS, eCommerce, Financial Services, Gaming, Healthcare or Telco apps.

But in this blog, we’re focusing on end-users of these apps.

Everyone has had the unsatisfying experience of a website or application that messes up. Problems can come from anywhere in the software stack, including:

  • User error
  • Bugs in the application software
  • Device issues
  • Network issues
  • Database issues
  • Infrastructure and resource issues

The last two are bold because that’s what we’re discussing - the foundation of a tech stack.

Being in California, we take extra care with foundations - and faults. ;)

If you fail to plan, you’re planning to fail.

Benjamin Franklin

One must plan database and infrastructure for planned events, such as:

  • Maintenance activities
    • Upgrade OS, upgrade database,
    • Replace a physical server, or a compute instance in the cloud

...and unplanned events, such as:

  • Database overload due to users, for example:
    • too many requests - could be sudden spikes in traffic
    • stuck query - perhaps a missing index
  • Database overload due to resource issues, for example:
    • Database server out of memory or pushing CPU limit
  • Loss of network connectivity, for example a data center power outage
  • Cloud vendor-specific issues and mismanagement (hence the best practice for hybrid-cloud and multi-cloud deployments)
  • Cyberattack - increasing likelihood
    • Ransomware, blocking
  • Natural disaster - ever more common as shown in the graph below
    • Fire, flood, earthquake or other events that affect infrastructure
There is an increasing trend of destructive natural disasters, for which the costs are going up. Source: climate.gov

Not only must your application be always-on, but you must architect for an immediately responsive and accurate user experience.

End-users may feel “database downtime” as:

  • An unsuccessful Write
    • You sign up for a new SaaS platform, provide all your contact information, and hit submit - but it stalls - you must start over, or wait so long that you give up
  • An unsuccessful Read
    • You are unable to retrieve your information, such as your checking account total on your banking website.

The common way to solve this problem is to seek out a MySQL HA solution:

  • Synchronous and Group Replication Clustering - solutions that require assembling and result in locking or lag especially over WAN:
    • Oracle’s MySQL InnoDB Cluster
    • Codership Galera Cluster and its forks:
      • MariaDB Cluster
      • Percona XtraDB Cluster
  • DIY, for example with ProxySQL and Orchestrator - can work over WAN but provides little automation, no documentation, and is expensive to test and maintain:

Depending on your deployment, there is often a whole new set of problems...created by these solutions. End-users might feel MySQL HA issues as:

  • Unsuccessful replication of an Update between database servers
    • You change your account information on your bank website, for example, and though the app acknowledges the update, you do not see the change. Perhaps there was a lock or significant delay replicating the Update to the Replica databases. Hence why the Update is not there when the app tries to Read from a Replica.
  • Unsuccessful Read due to replication lag or database proxy misconfiguration
    • You submit your username and password, but you are unable to log into your account - the database is unavailable - so the application is unresponsive.
  • Data loss or inconsistent data
    • You submit your username and password, but you are unable to log into your account - there is a mismatch of data between database servers, and the application is unaware which database server has the latest or most accurate data.

So let’s take a step back. What are Architects really looking for, when seeking a MySQL HA solution?

Priorities might include (but are not limited to):

  • Redundancy (real-time replication) for fault-tolerance
  • Automated failure detection and monitoring and alerting
  • Automated seamless failover between database servers
  • Performance - Load balancing and read/write splitting
  • Intelligent, graceful query management and authentication
  • Scalability and long-term durability, including predictability with cost
  • Limited manual labor, easy deployment and management
  • No data loss

That’s a lot to consider! And this does not yet even touch the need for Disaster Recovery, which is increasingly key to business continuity planning.

Hence why Architects feel pressure to use a proprietary database - give up ownership, control and flexibility in exchange for simplicity, ease, and comfort with a “Managed DBaaS.” It’s likely you’re already a customer of AWS, anyway, right? Wrong! Don’t be fooled with the ‘managed’ part, it is far less ‘managed’ than you might hope for.

If you can give up control over your data operations and rely on DBaaS, there a few major options:

  • Amazon AWS Aurora for MySQL
  • Microsoft Azure MySQL
  • Google GCP MySQL

To many, DBaaS feels like a black box that’s pretty limited and expensive to scale with. You’re locked into a cloud vendor who still performs scheduled downtime maintenance updates, and there are other risks and limitations to availability, such as the inability to deploy de facto best practices such as cross-region, hybrid-cloud or cross-cloud. This puts your app at risk, not to mention geo-scalability.

The good news is there is a goldilocks zone - a way to achieve continuous operations without compromising on availability, ease or capability - and “it just works”... “out of the box.”

If you’re looking for a multi-site, multi-region, or global deployment - Tungsten Clustering is the only way to achieve that with any level of ease, availability and performance.

With Tungsten you can achieve cross-region, hybrid-cloud, and multi-cloud, with your favorite MySQL vendor and version.

Continuent’s goal is not just to provide MySQL HA - but actually enable continuous MySQL operations - no matter where your end-users are. That means, Tungsten is best-suited for organizations that require the highest availability now and for all the changes that are inevitable in the modern world.

That’s why Continuent has a 100% customer satisfaction rate - and has maintained this reputation for over a decade.

Hope you enjoyed this coffee break - look forward to more!

Reach out to learn more about Tungsten (BTW named because the element Tungsten has the highest tensile strength of any pure metal) - that’s important for any solid foundation!

About the Author

Sara Captain
Director of Product Marketing

Sara has worn various hats at Continuent since 2014. Listening to Continuent customers over the years, Sara fell in love with the Continuent Tungsten suite of products. She started learning Linux and MySQL administration with the support of Continuent's amazing team, so she can help with keeping Customers happy. Prior to Continuent she worked in consulting with a focus on leveraging data.

Add new comment