Saturday, June 5, 2010

Java vs. Python: fetching URLs

Here we go, another Java vs. Python comparison (I just can't help myself). This time it's about standard library usefulness in doing certain tasks. Fetching the contents of a URL should be a trivial one, but in Java, it's not. Especially if the contents of that URL are gzipped and use a nice charset such as UTF-8.

Java:

URL url = new URL("http://www.example.com/"); URLConnection conn = url.openConnection(); conn.connect(); InputStream in; if (conn.getContentEncoding().equals("gzip")) { in = new GZIPInputStream(conn.getInputStream()); } else { in = conn.getInputStream(); } String charset = conn.getContentType(); BufferedReader reader; if (charset.indexOf("charset=") != -1) { charset = charset.substring(charset.indexOf("charset=") + 8); reader = new BufferedReader(new InputStreamReader(in, charset)); } else { charset = null; reader = new BufferedReader(new InputStreamReader(in)); } StringBuilder builder = new StringBuilder(); String line = reader.readLine(); while (line != null) { builder.append(line + '\n'); line = reader.readLine(); } String content = builder.toString(); // FINALLY!

Python:

content = urllib2.urlopen("http://www.example.com/").read()

At first I though yeah, well, Java is probably older and wasn't designed to do such things very often. I was wrong. Java appeared in 1995, Python in 1991.

Tuesday, June 1, 2010

Java "Hello, World!" 6x slower than Python

Yup. I know, micro-benchmarks, but I find this one quite interesting. Take two small programs, one Java, one Python:

package com.test; public class TestJava { public static void main(String[] args) throws Exception { System.out.println("Hello, World!"); } } print "Hello, World!"

Running these two through time clearly shows that Java is 6 (six) times slower:

[felix@the-machine bin]$ time java com.test.TestJava Hello, World! real 0m0.142s user 0m0.080s sys 0m0.017s [felix@the-machine bin]$ time java com.test.TestJava Hello, World! real 0m0.142s user 0m0.077s sys 0m0.020s [felix@the-machine bin]$ time java com.test.TestJava Hello, World! real 0m0.154s user 0m0.070s sys 0m0.023s [felix@the-machine python]$ time python test.py Hello, World! real 0m0.024s user 0m0.013s sys 0m0.007s [felix@the-machine python]$ time python test.py Hello, World! real 0m0.025s user 0m0.020s sys 0m0.003s [felix@the-machine python]$ time python test.py Hello, World! real 0m0.026s user 0m0.013s sys 0m0.007s

Not only is Java slower, it's also a lot more code. And I'm not only talking about lines of code here (which, still, are 6 times more; really it's 1 line of Python and 6 of Java -- if you remove the whitespace), but compiled code, too. Look:

[felix@the-machine bin]$ ls -lh com/test/TestJava.class -rw-r--r-- 1 felix felix 595 Jun 1 21:50 com/test/TestJava.class [felix@the-machine python]$ ls -lh test.pyc -rw-r--r-- 1 felix felix 117 Jun 1 21:59 test.pyc

The compiled Java code is 595 bytes, while the compiled Python code is 117 bytes. Five times bigger. Say it with me: Java == BLOAT !