Performance of Programming Languages November 28
Some of the research I do, particularly Networks/P2P research, requires developing simulation, modeling, or experimentation software. Over the years I have become a fan of a growing set of programming languages. In my experience, educating new programmers on how to program safely and efficiently in C/C++ takes time. As a result, I have been looking more and more towards languages with garbage collection such as Java, C# (Mono), Python, and Ruby. Given the current popularity of Ruby-on-rails, I thought it might be interesting to compare Ruby and other languages in terms of program running time.
After spending time with so called “scripting languages” (a term that I feel is not a terribly well defined or useful), I have found that I enjoy programming in languages like Ruby, Python, and Boo more than languages like C, C++, Java, or C# (though C# is getting better and better as they improve the language). My question is: how much performance do I loose by adopting one these laguages. To answer this question I took a look at the Language Shootout site.
One can always argue about benchmarks, but at least they give us some starting point. Comparing the above languages (except Boo) in terms of running times, we see:
- C < C++ (1.0 - 1.4)
- C++ < Java (1.1 - 4.1), Memory is usually around 10.0x
- Java < C# (Mono) (1.0 - 4.0) (but uses less memory)
- C# < Python (this one is noisy, a few Python tests do better, but many are 10-50 times slower than C#)
- Python < Ruby (1.5 - 9.2)
The above is by no means scientific. I always excluded start-up time since I am more interested in long-running code (simulations are long running experiments). Clearly these languages are not all equal in performance. Comparing Ruby to C we see that Ruby is between 14 and 600 times slower (several tests are more than 100 times slower).
It is easy for such comments to spiral out of control and provoke flames from fans of various languages, however, it is good to at least have some (though noisy and inaccurate) means to compare these languages. There are many rebuttals to the above:
- Optimization: Clearly gcc has been optimized longer than all the other items in the above list, so we would expect it to perform well, but, dynamically JIT compiled languages could in principle be faster [due to having access to test data at compile-time (which is also run-time)]. That may be true, and I can’t wait for it to happen, but it looks as though it has not happened yet.
- Usability/Productivity/Library Availability: As stated above, programming in C or C++ is generally considered harder to do right than the other languages. So, while the program may run slower, more (usable) code will be written in the other languages due to productivity boosts. While this is true, in some cases, there is a certain minimum performance that must be reached. Given that constraint, I would like to see if I can use a nice language of my choice. In many cases, it appears that the difference between the slowest and fastest will be profound
So, given that Ruby is a nice language, how practical is it to use for performance-critical applications? It appears that cost of choosing Ruby over C or C++ is very great in terms of performance. Depending on the size of your problem or the size of your computer, Ruby may be a very bad choice.
Others have given a different answer to this question. For instance, in an article titled It’s boring to scale with Ruby on Rails, the author argues that scaling for Ruby is no problem, since the major cost is not hardware, but programmers. This may be true for some, but not all. For instance in the cases of individuals hosting their own site labor is fairly cheap, but hardware is relatively expensive. In the case of research, we are often interested in tackling problems that are bounded by our computational resources. In this case, using a better performing system translates into solving a bigger problem: i.e. one that may actually be interesting.
Finally, with Netmodeler, we have taken the approach of using C++ for the core library (which allows for efficiently storing large networks in memory and quickly running graph algorithms), but using SWIG to make Python wrappers, which allow us to script the use of Netmodeler to build specific tools or simulators. This allows us to get the performance of C++, but the ease of building a new simulation program in Python.
As always, one size does not fit all.