Wednesday, September 30, 2009

String.format vs. MessageFormat.format vs. String.+ vs. StringBuilder.append

Usually I'm not a performance hacker, i.e. I seldom use the final keyword, and I dare to use List.size() in loop expressions. But I'm a little bit sensitive when it comes to string operations, as they can cause severe performance problems, and sometimes security issues as well, e.g., if an SQL query is to be created (ok, bad example, one can use PerparedStatement instead of hacking an SQL string). For logging I often have to concatenate strings, such as
if (log.isLoggable(Level.INFO)) {
    log.info("Command created - cmd=" + cmd); //$NON-NLS-1$
}
Actually I'm not writing lines like this myself, I let log4e do the job -- I really love that tool! But sometimes these string concatenations are hard to read, using a String.format or MessageFormat.format would make it much easier. Here are some examples:
foo(String.format("Iteration %d, result '%s'", i, value[i%2]));

foo(MessageFormat.format("Iteration {0}, result ''{1}''", i, value[i%2]));

// single line:
foo( "Iteration " + i + ", result '" + value[i%2] + "'");

// multi line:
String s;
s = "Iteration ";
foo(s); s+= i; foo(s); s+= ",\nresult '";
foo(s); s+= largeString(i); foo(s); s+= "', '"; 
foo(s); s+= largeString(i+1); foo(s); s+= "'";

foo(new StringBuilder("Iteration '")
    .append(i).append(", result '").append(value[i%2]).append("'")
    .toString());
(Update: the multi line test was added after daObi and Laurent commented the first version) Hmm... maybe the "+" isn't that bad ;-) I do not understand why String.format andMessageFormat.format have these different notations, and frankly I seldom get the format string right in the first run. So, besides the aesthetics, what about performance. What would you guess? My guess was StringBuilder would be the fastest, followed by the format methods, and the String operator "+" would be the slowest as new Strings are to be created all along. I run a little test (calling these methods 100000 times, foo() is doing nothing and was, besides other things, only added in order to avoid too much compiler optimizations, although I have no idea if that worked ;-) ). Note that in the test only a few strings are concatenated, but I think that's a typical example and it was the setting I was thinking about to use format methods instead of string concatenation (and it's quite similar to many toString methods). I was surprised by the result:
String.format:757 ms
MessageFormat.format:1078 ms
String +:61 ms
String + (multi line):162 ms
StringBuilder:74 ms
Gosh! The "+" operator clearly is the winner! It is 10 (ten!) times faster then the format methods. 10 times, can you believe that. I was really surprised, and frankly, I couldn't believe it. (Update: daObi and Laurent explain it, see their comments below). Thus I changed the test, assuming the compiler tricked me, but I always got more or less the same results. I figured it may be the size of the strings, so I also changed that. That is, I used large strings (with 70.000 characters, the method returns one of two strings in order to avoid compiler optimization) in the concatenation:
for (int i = 0; i < n; i++) {
    s = MessageFormat.format("Iteration {0,number,#},\nresult ''{1}'', ''{2}''", i,
   largeString(i), largeString(i+1));
}

for (int i = 0; i < n; i++) {
 s =  "Iteration " + i + ",\nresult '" + largeString(i) + "', '" + largeString(i+1) + "'";
}
Actually, the test is a little bit unfair, as in all cases at least the result string has to be created (i.e. also the format methods have to create strings), but at least the string is rather long. The result string s has a length of 140027 characters, so this is a pretty long string, isnt't? Here are the result of the long-string-test:
String.format:584 ms
MessageFormat.format:816 ms
String +:621 ms
String + (multi line):867 ms
StringBuilder:555 ms
Yeah! Now we have the order I expected in the first place: StringBuffer.append is the winner, followed by String.format. MessageFormat.format has a little problem and is even slower than the String concatenation. But compared to the result of the first test, all methods are at the same speed. So, what did I learn? First, I will use the plain and simple string concatenation in the future more often, as it is easier to write in many cases compared to some weird format strings. I will use String.format only if I probably need to translate my code, since this is easier to translate a format string than changing the code (if the order of the sentence components change in another language). And, of course, I will use StringBuilder if large strings have to be concatenated, but I think I use it less in toString-methods. And maybe I'm going to read a book about performance hacking -- do you have any recommendations? Anyway, this test also teaches me (again) that good performance is not (only) a question of single statements, but a question of good algorithms and good design in general. Note: Actually this is not the first time I was surprised by the result of a preformance test. I did a test once comparing the performance of different class library designs in case of a math library for matrices and vectors. I have documented the test in the LWJGL forum, and its results influenced the design of the GEF3D geometry package (see package documentation).