From C/C++ to Java

In 1999 when we were architecting and implementing Onebox.com, we had to select an implementation language. Given that all the principle architects at Onebox were from JavaSoft, so one would expect Java to be the natural choice. It was not.

Before going further, I should perhaps write a little about Onebox.com. Onebox.com was a unified messaging startup back in the Dot Com “gogo” days. The idea was to provide subscribers with a phone number and a web based “inbox” from whence they could get their email, voicemails, and faxes. So when a subscriber registered, they could pick a phone number in an area code, pick a user name and they were in business. The basic service was free, and the premium services (such as phone numbers without extensions, more disk, multiple phone numbers) were add on. The service did quite well and we were servicing over five million subscribers by the time we got acquired in 2000.


Given the requirements of Onebox (millions of users), we had to ensure that our architecture and its implementation would scale. When we balanced out all the requirements for an implementation language, we ultimately selected C/C++. Quite frankly, at the time it was not a difficult choice. We knew then that Java was simply not mature nor robust enough to support the kind of RAS (reliability, availability, scalability) requirements that we had before us. Back then, the JVM was not relatively performant, and was known for consuming lots of system resources and requiring frequent restarts; moreover, stability was not its hallmark and it was not uncommon for the JVM to crash.

Much has changed in the past six years. Today many of the same engineers that were members of the Onebox team are a part of Zimbra (including Anand Palaniswamy who worked on Java VMs at JavaSoft and BEA). This time around we did elect to go with Java. Given Java’s maturing over the years, we think its benefits far outweigh its costs. For example, the nature of the problem that we are solving is not sensitive to the small latencies introduced by garbage collection cycles (unlike say a voice gateway application) nor are we dealing with a compute bound problem (e.g. like an image processing application). Given these parameters and the fact that the problem we are solving is typically I/O bound, we chose Java.

Our performance tests have shown to us that in fact, the JVM is not at all in the critical path and on the whole we feel that our choice is validated. I will add that careful tuning of the JVM parameters is very important to ensuring the right runtime characteristics for the VM for any given application. I will be writing more about this later.

One of the key benefits of using Java is that it really cuts down development time. There is no way Zimbra would be where it is today if we had developed using C/C++ (and I am a big fan of C!). For example, there are so many Java packages available to help solve many of the tasks that we seem to write code for over and over again each time we embark on a new project – there are even good packages available for more esoteric tasks. Another nice benefit of using Java is that nasty buffer overflow and memory corruption problems goes away; however, memory leaks do not!

Of course, like everything else, Java is not perfect. Like any tool it has it weaknesses, and like any tool it has it appropriate and inappropriate uses (never use a hammer when the jobs calls for a screwdriver). Here are some of the bad:

As I said previously, there are packages available to solve just about any problem. It is really easy to grab them and introduce them into a product. The problem is that some of these package may be poorly implemented. This means you have to do your homework. Review the code (if available), definitely benchmark it: See how much memory it uses, how many disk cycle does it use (if it does I/O), what kind of cpu usage characteristics does it have? If you fail do take the time to do this work, you will pay a big price, and you will often learn of the price way too late in the development cycle

It is really quite easy to write inefficient code in Java. Because Java provides some nice abstractions, it is easy to lose sight of what is going on under the covers, and it’s a topic that has been discussed extensively on the web and in books.

Let me give you one specific example of how we were surprised to find some inefficient code, and we we consider ourselves very Java savvy! Our hosted demo code, which provisions and expires users on the fly, was leaking memory. While we were heap profiling it to find the culprit, we were noticing an extraordinary number of int[], Matcher and Pattern objects were being allocated very rapidly. These were not leaking, but still seemed strange. We found and fixed the memory leak. But we also made sure to optimize places where the demo code called String.replaceAll() and String.split() – these methods create and compile regular expressions on the fly. The simple lesson here is to ensure that you compile regular expressions once and use them over and over again, but the larger lesson is that something that looks innocuous during a long night of coding, such as String.split(), might cause you problems later. Know your abstractions.

We also found a few places through code review (these were not in tight loops or critical paths – or we would have found them earlier because we profile the core server code all the time) where we were using String.split() when a simple String.indexOf() would have been more than sufficient. For kicks, we wrote a small test (yeah, yeah, I know micro benchmarks suck, but I couldn’t resist the temptation), and it shows how you can be 15x as expensive without knowing about it.

$ cat Test.java
import com.zimbra.cs.util.EmailUtil;

class Test {
public static void main(String[] args) {
final int N = 1000000;

long start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
String s = "haha@haha.com".replaceAll("@", ":");
}
System.out.println("replaceAll: " + (System.currentTimeMillis() - start));

start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
String s = "haha@haha.com".replace('@', ':');
}
System.out.println("replace: " + (System.currentTimeMillis() - start));

start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
String s[] = "haha@haha.com".split("@");
}
System.out.println("split: " + (System.currentTimeMillis() - start));

start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
String s[] = EmailUtil.getLocalPartAndDomain("haha@haha.com");
}
System.out.println("indexOf/substr: " + (System.currentTimeMillis() - start));
}
}

$ zmjava Test

replaceAll: 3529
replace: 211
split: 3398
indexOf/substr: 208

This is not to say that regular expressions are bad or that all object
allocations are bad. As they say, everything in moderation and use it right.

There is a common perception that Java does away with the need to think about memory management. WRONG! while you cannot corrupt memory via such things as out of bound memory writes, you can still cause the system to leak memory like a sieve. So developers need to think carefully when they are writing their code. Are you holding on to object references when you no longer need them (a classic example is keep around hashes and vectors that have hundreds of object references contained within them)?

The bottom line is that while Java is a tool like any other language. It has it strengths and it has its weaknesses. We felt that it performance characteristics, along with its ability to accelerate development made it the right tool for developing the Zimbra Collaboration Suite. On a final point I would recommend anyone doing serious Java Development to invest in a copy of Josh Bloch's "Effective Java Programming Language Guide"

(Thanks to Anand Palaniswamy for contributing to this blog!)

Comments are closed.