One of the projects we are working on at Zimbra is project “Always On”. The goal is very simple: Email and collaboration should be “always on” for end users. It’s no secret that email is the number one tool used for business collaboration. Project Always On solves key issues related to operating a platform built to be the central hub of collaboration delivered as a cloud service.
Our Design Goals
We started with some very basic design goals based on what we believe are the key attributes of a cloud service.
- Inherently resilient to failure
- Scaling should be elastic based on workload demands
- The software can be enhanced without service disruption
- Efficient usage of commodity hardware resources
As we thought about these design goals, we arrived at a few technical capabilities and basic architectural requirements to achieve them.
- No single points of failure in the application components
- Separating the application code from the data
- Distributing state information across commodity storage
- Automatic failover of application and data storage components
- Automatic load balancing of client requests across the application and data layers
How Zimbra is Changing
There is so much of work that goes into re-architecting your software, so we’ve been doing this over several phases. Our MTA and Proxy components already have load balancing and failover capabilities and we introduced multi-master replication for our LDAP directory in Zimbra 8.0 last year. Or focus has now shifted to our mailbox server. We have spread this work across two phases.
Phase 1 work is centered on breaking up the application and data components of the mailbox server.
- Separating our application code from the data stores. Today our Java application code running in a Jetty instance has an affinity to the instance of MySQL (soon to be MariaDB) running on the same server. We have been refactoring and enhancing our application code to remove this dependency for two reasons: Enable MariaDB to be optionally deployed and scaled in a separate data services layer and to enable distributed metadata through a MariaDB Galera Cluster.
- Switching from MySQL to MariaDB. We have been following the MariaDB community for a while. The improvements in performance and the speed at which bugs and security issues are resolved are compelling reasons alone. Now that SkySQL is providing commercial support, we felt it was the right time to make a switch. And switch we did. It took us about 5 minutes to replace MySQL with MariaDB and about 3 minutes to replace ConnectorJ with the MariaDB Java Client and 1 bug fix.
Phase 2 work is centered on distributed data and install/upgrade orchestration.
- Mailbox Metadata. As I mentioned earlier, we will be using MariaDB Galera Cluster to implement a scale out, shared nothing, active-active design for our mailbox metadata.
- Search indexes. We are looking at using Apache SOLR for distributed indexing building on our current use of Apache Lucene.
- Distributed Blobs. We plan to continue to enhance our StoreManager API and implement an S3 compatible interface for use with cloud storage solutions like Scality and Amazon. We have also changed the pathing of blob data on the file system to include a “node name”. This will enable the use of clustered/distributed file systems like NFS and Ceph using their POSIX file system interface.
- Rolling updates and upgrades. We want to enable software updates and even version upgrades to coexist with previous versions and to be installed across each application component layer without user disruption.
There is so much more work going into the Always On project than I’ve covered here. I’ll post some demoes of our Always On lab soon as well as some of the other features we are working on for our future releases.