README for Lasu's Langugage Implementation Benchmark ==================================================== NOTE: This is work in progress. I wish to find a new programming language for my hobby projects. One of the criteria I have is execution speed. I currently use Python as my default language and I'm not altogether happy with the speed. Mostly, the tasks are small and simple enough that speed is irrelevant, but occasionally I want to do something compute intensive. In these cases, I now resort to C, but that is more tedious than I prefer. I would like to find a nice high level language which also has a high execution speed. Therefore, this benchmark suite. I will come up with some benchmarks that model things I want to do. They are not necessarily applicable to other people, but perhaps they are. As with all benchmarks, you need to decide for yourself if the benchmark measures things you care about. Note that this benchmark is concerned only with execution speed. Good programming style, productivity, or other important factors for software engineering are not considered. The benchmarks are meant to be implemented in many different languages. Thus I do not want to measure small scale things like "function call overhead": that benchmark is meaningless in a language without functions. Instead, I provide a set of specifications for tasks and they can be implemented in any manner deemed suitable. The goal is to measure language implementations. Thus benchmark implementations should stick to the language in question. It is not fair, for example, for a Python program to call a custom C module. Calling standard library modules for the language is allowed, even if they are not implemented in the language itself. Non-standard modules are not allowed. You do not get to choose compiler options for compiled languages. I may run programs with several sets of compiler options, if it is relevant. If there are several implementations of a language, I may run all of them. Any reasonable language should be able to implement all these benchmarks. It is not, however, necessary to implement all benchmarks in every language. Several implementations in the same language are also possible, for comparing programming styles. I do not claim that the implementations included here are the fastest possible ones for each language. If you have a better one, please send it in to be included here. The benchmarks will be run on a PC running a testing or unstable version of the Debian GNU/Linux operating system. The machine may or may not be modern, and may or may not have excessive amounts of memory. I will only install packages from the Debian "main" section: no non-free implementations and no implementations not distributed by Debian. I wouldn't use such implementations for my hobby programming anyway. Results ======= The following are the results of the first run I made, on April 4, 2004. They are probably not representative. simple longline huge binary ascii-wordfreq-liw.gcc-2.95 0.0/0.0 0.6/0.0 54.8/0.8 0.0/0.0 ascii-wordfreq-liw.gcc-3.2 0.0/0.0 0.6/0.0 55.3/0.8 0.0/0.0 ascii-wordfreq-liw.gcc-3.3 0.0/0.0 0.6/0.0 54.4/0.9 0.0/0.0 ascii-wordfreq-liw.py.python2.2 0.1/0.0 0.1/0.0 106.8/0.4 0.0/0.0 ascii-wordfreq-liw.py.python2.3 0.1/0.0 0.2/0.1 103.6/1.6 0.1/0.0 ascii-wordfreq-liw.sh.bash 0.1/0.0 0.1/0.1 639.6/6.6 0.0/0.0 Benchmark specifications ======================== The benchmark specifications are meant to be unambigous as far as the expected output is concerned. In other words, anyone reasonably competent is expected to be able to produce a program that produces exactly the same output as someone else's program. This is necessary so that the correctness of the programs can be easily checked automatically. ascii-wordfreq -------------- Read text from the standard input and count the number of times each word occurs. Convert letters to lower case. Order the words according to frequency, words with the same frequency should be ordered in ascending lexicographic order according to character code. Print out the top N words, where N is a decimal number given on the command line. Each output line must contain the count, a space, and the word (in lower case), and end in an ASCII LINE FEED character. Output must contain exactly N such output lines and no other output lines. A word contains only ASCII letters A through Z and a through z (convert upper case to lower case) and ASCII digits 0 through 9 and is not empty. All other characters separate words and are ignored except to notice word boundaries. Word boundaries only occur at the beginning and end of the file and at non-word characters. You may not assume a maximum length for the word, line, or input file.