Scaling up Zimbra

By Scott Dietzen on April 17, 2006 in Open Source, Zimbra Server

We are now working with some of the Zimbra early adopters on larger-scale deployments, including enduser deployments upwards of 100 thousand mailboxes, and hosted/internet service provider deployments north of 1 million mailboxes. In general, the individual (per mailbox) profiles tends to be on the smaller side (typically, <100 messages/day, <200Mb average mailbox size), but the aggregate workloads are nevertheless substantial, and are helping prove out the Zimbra Collaboration Suite (ZCS) for very-large scale distributed deployments.
While it’s fair to say that we are still working on hardening, Zimbra was designed from the ground up for such scale. The Zimbra architecture inherits from distributed systems expertise that was gleaned building messaging systems that today host many millions of mailboxes world-wide and Java systems that have thousands of production server CPUs within single large Telco deployments.

Of paramount importance to scaling is partitioning. Partitioning leverages “locality of reference” for both processing and data—if certain servers can be specialized to solve some subset of the bigger problem, then the essential code and data are more likely already to be in memory or close at hand on fast disk. Partitioning techniques include the “vertical” partitioning of functional tasks and the “horizontal” partitioning of data and the associated processing (more below). Partitioning is augmented by other well-honed distributed systems techniques like automated replication, data dependent routing, load balancing, and failover. Overall, these techniques have proven (e.g., Google, Yahoo!, Amazon, etc.) to scale well beyond the reach of more centralized architectures that, say, rely on stateless processing and a single very-large database.

Vertical partitioning allows complex processing tasks to be divided into subtasks that can be more independently optimized, managed, and debugged. Vertical partitioning within Zimbra primarily consists of off-loading the computationally expensive security tier, which interfaces between Zimbra and the greater Internet, from the mailbox servers, which manage user data—messages, appointments, contacts, etc. This security tier includes Postfix (the Mail Transfer Agent/MTA included within Zimbra for mail routing, policy, etc.) as well as any on-premises anti-spam and anti-virus (Zimbra includes the leading open source technologies—SpamAssassin and ClamAV, but is also compatible with commercial AS/AV). What is more, Zimbra’s security tier is “effectively stateless” (the SMTP protocol provides for the automatic redelivery of unacknowledged messages, and Zimbra doesn’t ack until the message is transactionally stored within the users mailbox). This allows you to independently size your ZCS MTA server farm based on aggregate security workload, but still have Zimbra automatically manage all the distributed subcomponents as well as the routing of communications to and from the ZCS mailbox servers (via SMTP & LMTP).

Horizontal partitioning is far more critical for very large-scale deployments, since there is generally a lot more data than tasks. Large Zimbra deployments are horizontally partitioned across servers (and the attached storage) by enduser mailboxes. An enduser’s mailbox includes his or her messages, calendar, contacts, notes, and so on, which are all collocated for efficient user context switching. So ZCS servers are inherently stateful—each serves as the “primary” location for a subset of the aggregate mailboxes. This requires that each ZCS server have the smarts to reroute an protocol request (via XML/SOAP, IMAP, POP, …) to the appropriate primary server in the event that an in-bound load balancer makes the wrong decision.

Automated replication and failover is also essential. For example, LDAP configuration data (which includes enduser mailbox locations) is fully replicated to as many replica hosts as are required to meet performance and availability requirements. LDAP replica hosts may be collocated with other ZCS servers or “vertically partitioned” to dedicated servers. Mailbox data, on the other hand, can be transparently replicated within storage system (such as via RAID or mirroring) for availabilty only. (It does not make sense to replicate mailboxes for scalability, given how frequently state data is updated.) In the future, Zimbra will also support mailbox replication on “vanilla” storage by replaying the ZCS transaction or change log (used to guarantee consistency between the message and meta-data stores) on a secondary server. In either case, clustering technology is used to automatically failover from the primary server to a preconfigured secondary (or tertiary) server that assumes the role of primary for that mailbox (and assures that the former primary no longer has “write” access to the mailbox in order to avoid “split-brained syndrome”).

Meta-data optimization/partitioning is one of the less frequently discussed scaling techniques employed within the Zimbra architecture. Meta-data for a mailbox is all of the data required for “navigating” to the appropriate message or meeting. Zimbra meta-data includes ZCS’s very efficient, Lucene-based index into all the text contained in every message, meeting, contact, attached document, and so on. Zimbra meta-data also includes the structured meta-data that captures folders, tags, dates, saved searches, etc. Zimbra uses an off-the-shelf SQL relational database to optimize structured meta-data queries and updates, but meta-data is, of course, horizontally partitioned by user. The key insight is that this meta-data should also be partitioned from the target data (message, meeting, etc.) to ensure very efficient processing. This allows sophisticated navigation to be nearly instantaneous even across very large mailboxes (mine is between 2 and 3Gb now). Latency in access to the message body itself (which could, for example, reside in an HSM system or ultimately even be split out in another tier) is not nearly so problematic to the user experience as latency or inefficiency in accessing the meta-data. Partitioned meta-data also allows potentially expensive operations such as compliance-related cross-mailbox discovery to be handled efficiently (via simply composing the appropriate horizontally-partitioned search results).

Well, hopefully that 10,000ft. view gives you some more insight into how the Zimbra team has been innovating in scalability. (To dig in deeper, please check out the ZCS Multi-Server Configuration Guide or the Zimbra Architectural Overview.) While the caveat that we still have more hardening to do remains, we are convinced that the Zimbra architecture is uniquely designed to marry rich enduser functionality with unprecedented scalability. We’ll even wager that some of the messaging servers that pre-date Zimbra are going to attempt to rework their internal architectures to comply with more of these principals.

Navigation

Scaling up Zimbra

Comments are closed.