Project Always On

By | September 6, 2013

One of the projects we are working on at Zimbra is project “Always On”. The goal is very simple: Email and collaboration should be “always on” for end users. It’s no secret that email is the number one tool used for business collaboration. Project Always On solves key issues related to operating a platform built to be the central hub of collaboration delivered as a cloud service.

Our Design Goals

We started with some very basic design goals based on what we believe are the key attributes of a cloud service.

  1. Inherently resilient to failure
  2. Scaling should be elastic based on workload demands
  3. The software can be enhanced without service disruption
  4. Efficient usage of commodity hardware resources

As we thought about these design goals, we arrived at a few technical capabilities and basic architectural requirements to achieve them.

  1. No single points of failure in the application components
  2. Separating the application code from the data
  3. Distributing state information across commodity storage
  4. Automatic failover of application and data storage components
  5. Automatic load balancing of client requests across the application and data layers

How Zimbra is Changing

There is so much of work that goes into re-architecting your software, so we’ve been doing this over several phases. Our MTA and Proxy components already have load balancing and failover capabilities and we introduced multi-master replication for our LDAP directory in Zimbra 8.0 last year. Or focus has now shifted to our mailbox server. We have spread this work across two phases.

Phase 1

Phase 1 work is centered on breaking up the application and data components of the mailbox server.

  • Splitting the web client code from the server logic. Today all of our static HTML, JavaScript and Java code run in the same Jetty instance. We are splitting it apart for two reasons: Enable UI customizations and code changes in real-time and to enable the web app and mailbox services to be optionally deployed on separate servers. In larger environments, deploying and scaling these components on separate servers will improve overall user density.
  • Separating our application code from the data stores. Today our Java application code running in a Jetty instance has an affinity to the instance of MySQL (soon to be MariaDB) running on the same server. We have been refactoring and enhancing our application code to remove this dependency for two reasons: Enable MariaDB to be optionally deployed and scaled in a separate data services layer and to enable distributed metadata through a MariaDB Galera Cluster.
  • Switching from MySQL to MariaDB. We have been following the MariaDB community for a while. The improvements in performance and the speed at which bugs and security issues are resolved are compelling reasons alone. Now that SkySQL is providing commercial support, we felt it was the right time to make a switch. And switch we did. It took us about 5 minutes to replace MySQL with MariaDB and about 3 minutes to replace ConnectorJ with the MariaDB Java Client and 1 bug fix.

Phase 2

Phase 2 work is centered on distributed data and install/upgrade orchestration.

  • Mailbox Metadata. As I mentioned earlier, we will be using MariaDB Galera Cluster to implement a scale out, shared nothing, active-active design for our mailbox metadata.
  • Search indexes. We are looking at using Apache SOLR for distributed indexing building on our current use of Apache Lucene.
  • Distributed Blobs. We plan to continue to enhance our StoreManager API and implement an S3 compatible interface for use with cloud storage solutions like Scality and Amazon. We have also changed the pathing of blob data on the file system to include a “node name”. This will enable the use of clustered/distributed file systems like NFS and Ceph using their POSIX file system interface.
  • Rolling updates and upgrades. We want to enable software updates and even version upgrades to coexist with previous versions and to be installed across each application component layer without user disruption.

There is so much more work going into the Always On project than I’ve covered here. I’ll post some demoes of our Always On lab soon as well as some of the other features we are working on for our future releases.


Comments

  • This sounds great!

    Hopefully some of these will make it to the OSS version.

    Glad to see a switch to MariaDB I’ve really become to dislike oracle recently and will be glad to see a switch from MySQL to MariaDB I performed the same thing on a couple of my own servers (non-email) and can say I’ve been pleased with MariaDB so far. No issues as of yet (5.5.x versions).

    But yeah, it’s great to see Zimbra is still moving along under it’s new management, keep it up, and good luck :-)

    Commented on September 10, 2013 at 12:54 am
  • I wonder why you are not switching to postgresql instead?

    Commented on October 21, 2013 at 11:11 am
  • >>It took us about 5 minutes to replace MySQL with MariaDB and about 3 minutes to replace ConnectorJ with the MariaDB Java Client and 1 bug fix<<
    .. could this be main reason ? :)

    Commented on October 24, 2013 at 2:59 am
  • As I read this I sighed with relief. Looks like Zimbra is finally getting the attention it deserves.

    Commented on November 27, 2013 at 10:35 am

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>