Lars Wirzenius: April, 2004

Contents

Thursday, April 29, 2004

Debian: Hostility pattern

Debian seems to be locked into a pattern of hostility. Like a schoolyard bully and his victim, it seems that we are unable to break the pattern, even if we realize that it is wrong and destructive to us all. For school bullying, an intervention sometimes works. Perhaps Debian isn't too big an organization for that, though I can't imagine who would do it.

I realize I've said similar things before. This time, I'm writing it in my log. I don't want to discuss anything on the mailing lists anymore.

Wednesday, April 28, 2004

Personal life: No GVADEC

I don't seem to be able to afford a trip to Norway to this year's GUADEC. Damn. My only chance at seeing fellow geeks en masse this year will be if they come to the Debian sauna gathering a few days later.


Random thought: Flame wars are attractive

I seem to be attacted to flame wars. When I browse Usenet and enter groups I don't regularly read, such as comp.lang.c or comp.lang.lisp, it is the large threads I start reading. These are typically flame wars.

Flame wars can be quite fun to read, if they're waged by intelligent people. The various sides are clearly polarized and by choosing one side, you can jump and down in your chair whenever it scores a point. If your chosen side loses a point, you get to growl and shake your fist at the screen. A bit like watching sports or soap operas, in other words.

When the debate is unintelligent or the issue matters to you, flame wars quickly lose their fascination. It is a bit like a candle flame: as long as it is small, playing with it can be fun. Toying with danger is exciting. If the flame escapes the candle and burns your clothes, the fun stops.

Monday, April 26, 2004

Rant: "Attachment: No Virus found" considered harmful

Due to various reasons, I am not currently running a filter on my e-mail. This gives me the pleasure to see the kinds of things spammers and virus writers do these days. This morning, I have seen a virus message with the the following signature (it was probably not the first such virus message):

+++ Attachment: No Virus found
+++ Panda AntiVirus - www.pandasoftware.com

It is good to scan outgoing e-mail for viruses, of course. If your scanner finds a virus, it should stop the mail from going out. If it does not find a virus, however, it should not add a note of this to the mail. Such notes have no useful purpose for the sender or recipient of the message, and only function as advertising for the virus checker. If the recipient is naive or ignorant and trusts the note, they will eventually be harmed by a virus.

The notes make the sender of the mail look stupid. Anyone with a clue understands that viruses will add such notes to the mail they send. Since the recipient can't trust the notes at all, anyone whose filter adds such notes is made look naive at best and incompetent at worst. Is that what you want to tell the world?

I don't want to have this kind of advertising in my mailbox. It wastes my bandwidth, my disk space, and my time. It differs from ordinary spam only in that it is sometimes attached to real mail.

Sunday, April 25, 2004

Random thought: Art of computer naming, part 2

I received some feedback on my log entry on the Art of computer naming. The entry was mentioned on Debian Weekly News, and at least Wouter Verhelst, Joshua Kwan, Scott James Remnant, Tollef Fog Heen, Martin Schultze, Mauri Sahlberg (in Finnish), and Jesus Climent commented on this theme in their own logs and in some of those, additional people discussed it. Some people mailed me in private. Below is a summary of the themes suggested by others.

A couple of particularly bad ones were also mentioned:

Interestingly, there is an RFC (RFC 1178 a.k.a. FYI5) with advice for naming computers. It is short, and recommended reading.


Random hacks: Of what use is Dijkstra?

The other night I was having trouble sleeping. To make work related thoughts go away I spent a while writing a command line tool that implements Dijkstra's shortest path algorithm. It reads in a text file giving node names and their distances and prints out the shortest path between two nodes given on the command line.

I haven't actually found any use for it yet. The most obvious one would be to optimize travel, but getting reliable data for that is expensive and, anyway, there is a "pathfinder" service for the greater Helsinki area on the web already.


Random hacks: Simple scripts

Joey Hess wrote in his log about a simple script he wrote to select songs and play them. He prefers it over the song selector in XMMS, which is hardly surprising, since the XMMS selector is quite primitive. (In fact, it is the primary reason I prefer Rhythmbox.)

Inspired by this, I'll mention a script I wrote a couple of years ago: ~/bin/movies. I have a web page that lists all the movies I have, partly to make it easer for friends to borrow them and partly to make it easier for me to avoid buying duplicates. I also record when I've watched a movie and I wrote the script to randomly pick a movie to watch next. It can choose randomly from all or those I haven't watched in a year, or it can list all the ones I haven't yet seen at all.

I find the script useful, since otherwise I spend an hour in front of my movie collection agonizing over which one to choose. I am the donkey between haystacks.

Wednesday, April 21, 2004

Personal life: 20

Twenty years ago my father bought the first computer for his company and very soon after that I started to learn programming. The computer was a Luxor ABC-802 (see some links), a Swedish office computer with a Z80 processor and 64 kilobytes of memory, half of which was a sort of RAM disk. It had a variant of BASIC included in its ROM, and that's the language I first learned to program in. Unlike many other BASIC variants, it was actually a fairly good language, with functions and local variables. Programs were forcefully indented: the program was stored in memory in a tokenized form and the LIST command would format it to text on the fly and indent it strictly according to a simple set of rules. (No wonder I'm happy with Python's significant indentation.)

As I recall, we had about three different applications for the computer: a word processor, a spreadsheet, and a simple database. I didn't use those very much, programming was so much more fun. To begin with, I would type in programs from listings in books and magazines. This included porting them to the Luxor, since it was incompatible with all other computers in the world.

At some point I learned Pascal from a book and used the Luxor's Pascal compiler, which was really, really slow. It would take five minutes or more to compile minimal programs, and then they couldn't do very much. I never liked Pascal until after I started using Turbo Pascal on my father's second computer, a Kaypro PC, in 1986 or so. A year later I bought a copy of Turbo C with my own money, and never looked back.

On the PC, I played around with a large number of freeware and shareware programs, but mostly they bored me and I would prefer to write my own. Nothing very fancy, however, I was no whiz kid.

The next big step in my career as a programmer came in 1988, when I started my computer science studies and met Linus T. After a year of studies, I skipped a year to do my military service, and a year after that I was finally able to afford the first computer of my own, a 386 with four megabytes of RAM and a 109 megabyte hard disk. I ran MS-DOS on it, but really wanted a proper operating system to help me debug pointer errors. Luckily, I managed to buy a very cheap copy of SCO Xenix, which served me well for a year until Linus and others got Linux working well enough to be a stable platform. After I got Linux working, MS-DOS and Xenix quickly disappeared from my hard disk and ever since I've only had Linux on my computer.

Twenty years. They went so fast. I wish I had taken the time to do something important.


Personal life: Going retro

A while ago I wrote that I desired a typewriter. Today I went one step further backwards and bought a fountain pen. I didn't even need a new pen, I just succumbed to my urge for a simpler, less complicated life away from high tech devices such as computers.

If I had ever had a fountain pen before, I would have known that they are not the simple things I thought they were. There's all sorts of technical features involved and there's even a web site, Penoply, dedicated to them, with a long FAQ on what can go wrong with them. I'm doomed to be high tech.

At least it is a fun toy. Writing feels completely different from a ball point pen.

Monday, April 19, 2004

Personal life: Stuff is piling up

I notice that my personal and work e-mail inboxes keep growing and other things I should do also gets piled up. I can feel my stress level going up as well, which means that stress related symptoms such as irritation are becoming more common. I need to start taking corrective steps by reducing the number things that require my attention. Luckily, the flu I had during Easter is gone and I can go to the gym again. It is strange how quickly it has become addictive.

Saturday, April 17, 2004

Review: Aikavalotuksia / Tapio Laine, Asko Vivolin

This log entry is in Finnish, since it is a review of a Finnish language book.

Kirjan koko otsikko on Aikavalotuksia. Valokuvausalan tarinoita aikalaisten kertomina ja sen ovat toimittaneet Tapio Laine ja Asko Vivolin. Tämä arvostelu on alunperin kirjoitettu kirjan ostettuani ja luettuani vuonna 2001, mutta julkaisen sen vasta nyt.

Laine ja Vivolin ovat keränneet kirjaksi puolen sadan ihmisen muisteloita ja valokuvia viime vuosisadan loppupuoliskolta. Muistelot ovat pitkälti anekdootteja ja esitetään aakkosjärjestyksessä. Mitään yritystä systemaattiseen historianesitykseen ei ole, mutta eipä kirjan otsikko tai takakansiteksti sellaista yritä väittääkään. Koska olen itse melkein täysin ummikko, enkä tunne alan kehitystä Suomessa tai eri tahojen keskinäisiä kytköksiä tai historiaa, jäin kaipaamaan edes jonkinlaista aikajanaa, johon olisi merkitty tärkeimpiä merkkipaaluja.

Tarinat ovat kyllä ihan hupaista luettavaa, mutta monet niistä toistavat hieman samoja teemoja: on ryypätty, digi jyrää, vanhoja ollaan ja kivaa oli. Pidin itse eniten Hannu Hautalan kirjoituksesta. Sen sijaan valokuvakauppiaiden kauppareissuista ja messumatkoista oli tylsä lukea. Olisin suonut, että tarinat olisivat keskittyneet enemmän valokuvaukseen ja sen kehitykseen kuin mitä tekivät.

Valokuvia on kiitettävän paljon ja ne on painettu isoina: usein koko sivun kokoisena eikä vain marginaalissa olevana postimerkkinä. Kun valokuva-ala on aiheena, ovat valokuvat välttämätön ja keskeinen osa sisältöä. Tässäkin kohtaa tosin huomaa, että kirja on anekdootti-kokoelma: jos tarkoituksena olisi ollut kertoa historiaa, eikä vain anekdootteja, kuvien valinta tai esitystapa olisi voinut keskittyä kuvakerronnan kehityksen ja valottaa esimerkiksi sitä, miten lehtikuvan asema ja käyttötapa on muuttunut vuosikymmenien varrella.

Taittaja on jättänyt keskelle kirjaa monta tyhjää sivua. Tämä oli omituista, joskaan ei suorastaan häiritsevää.

Yhteenveto: Ehdottomasti lukemisen arvoinen, mutta jättää minut kaipaamaan oikeaa historiankirjoitusta.

Thursday, April 15, 2004

Hedgehog Lisp: Vera likes Lisp? Weirdness

It seems that Vera, my co-worker, is capable of liking Lisp after all, if she's under enough medication.

Tuesday, April 13, 2004

Personal life: Oops, I erased all my music

I needed to clear up several gigabytes of disk space from my laptop so that I could carry some data home. I did this by removing all music files, since I thought I had backups of them. It turns out I didn't.

Well, actually I do. All the music I had was ripped from CDs I own, and I still have the CDs. I can re-rip them. This time, I think I'll research the issue first and see what the best way to do this is. I think I will at least save everything as FLAC so that I don't have to do the tedious ripping ever again. All my CDs as FLAC files should fit into 8 or 9 DVD+R disks. For storage on my laptop, I'll convert them to Ogg Vorbis files.

I'm going to have see what the best compression ratio for my needs is. Since the output quality of my laptop isn't very great, I might save some space by compressing tighter. If I later need higher quality files, I can convert them from the FLAC files. I'll need to see whether it is worthwhile to normalize volume levels between disks, and probably other things as well.

Sunday, April 11, 2004

System administration: GNOME 2.6 and Galeon anti-aliased fonts

I was feeling adventurous, and decided to upgrade GNOME 2.6 from Debian experimental. The dependencies are somewhat broken, so not all packages are installable, but other than that the only problem is that gpdf seems to not work at the moment for me. Not sure why, though.

I also got anti-aliased fonts to work with Galeon. Rob Weir told me on IRC that mozilla-xft needed to be installed for that. Now Galeon looks as pretty as Mozilla Firefox and I have lost all interest to switch.

Saturday, April 10, 2004

Random hacks: Language benchmarking, 0.1

I've received some responses to my log entry about a speed benchmark for language implementations. Most importantly, kind people have sent implementations in Ruby (Dafydd Harries), Tcl (Esko Arajärvi), Haskell (Isaac Jones), and O'Caml (Mike Furr), and I've also written an implementation in AWK myself. Unfortunately, the Tcl and Haskell versions require too much memory (stack or heap) and I can't run them with the huge input file I use with the others. Until they can be fixed, I'll post the current results with the other implementations.

version simple longline huge binary
daf.rb.ruby1.8 0.2 / 0.0 0.5 / 0.0 323.2 / 34.3 0.0 / 0.0
liw.awk.gawk 0.1 / 0.0 0.1 / 0.0 97.3 / 0.9 0.0 / 0.0
liw.awk.mawk 0.0 / 0.0 0.4 / 0.0 42.5 / 0.7 0.0 / 0.0
liw.gcc-2.95 0.0 / 0.0 0.5 / 0.0 54.4 / 0.6 0.0 / 0.0
liw.gcc-3.2 0.0 / 0.0 0.5 / 0.0 54.8 / 0.7 0.0 / 0.0
liw.gcc-3.3 0.0 / 0.0 0.5 / 0.0 54.0 / 0.6 0.0 / 0.0
liw.py.python2.3 0.1 / 0.0 0.2 / 0.0 98.9 / 0.6 0.1 / 0.0
mfurr.ocaml 0.0 / 0.0 0.2 / 0.0 36.6 / 0.7 0.0 / 0.0

The first column gives the benchmark implementation and the four remaining columns are the four different inputs I give to the programs. "simple" is a single copy of the GPL, version 2. "longline" is a single long word, one million characters. "huge" is 5000 copies of the GPL v2. "binary" contains some binary characters as well as 100 instances of the same word. For speed comparisons, the "huge" input is the interesting one, the others are really there only to make sure the programs handle different kinds of inputs correctly.

The times in the four input columns are user and system time. Each combination of program and input is run and timed five times and the median of each time is used.

With the small inputs, there are a few interesting things to note. First, with the "binary" input, which is only 800 bytes in total, pretty much only measures startup speed. It would seem that Python has a slight disadvantage here. More interesting are the large variations in speed with the "longline" input. GNU awk does remarkably well, while Mawk, another AWK implementation, does much worse: indeed, almost as badly as my own C code. Python and O'Caml do almost as well as GNU awk.

With the interesting input, "huge", the real speed differences become visible. First, note that all system times are very small: most of the processing happens inside the user code. This is good, because it means that the benchmark measures the language and benchmark task implementations and not Linux kernel code. For some reason the Ruby code does use quite a lot of system time as well. I did not investigate why, at least not yet.

The fastest program was the O'Caml one. My C programs were quite a bit slower, even Mawk was faster than my C code. Either this means that my implementation is crock or these two language implementations really are quite good. AWK has a reputation of being quite slow, but this task is pretty much exactly suited for it. Python and GNU awk did fairly well. Ruby was quite slow.

It is too early to draw any conclusions about the actual speed differences of the language implementations in question. The programs can and hopefully will be improved. The Ruby program, in particular, was written to be simple, not to be fast. I don't know Ruby myself, but perhaps someone who does would like to contribute a faster version?

If you use these timings to claim that a particular language implementation is faster or slower than another, I will come and be cross at you.

Note that now that I reserve the right to change the inputs. Don't count on the input being GPL v2 in the future, that would make things too easy to optimize.

See lang-bench-0.1.tar.gz for details, if you are interested.

Monday, April 05, 2004

Random hacks: md5sum.py updated

I updated my md5sum.py script. It now has --help and --quiet options, and ignores a leading asterisk in a filename, which some md5sum programs seem to add to indicate a binary file.

Probably very few people like this, but I find it useful, and so does a friend of mine, so it isn't a complete waste.


Photography: Take photographs and break your heart

Eamon Hickey writes on robgalbraith.com about Sports Illustrated's digital workflow, using their Superbowl job as an example. Very impressive, I wish I could work like that as a photographer. Just look at the kit of Bob Rosato on page 2 of the story: all the toys you could ever want.

The numbers were quite impressive: over sixteen thousand photographs for one event. More interesting to me, however, was the process of selecting and modifying photos. It is refreshing to remember that even the professionals have to go through the photographs one by one and choose the few ones that are any good. Steve Fine, the Director of Photography, sums up the way I tend to feel about my own photographs: "Eleven guys. Eleven versions out of focus." Of course, for me it is "One guy. Eleven versions out of focus."

Finding the good shots out of huge pile of photographs is both tedious and occasionally heartbreaking. You shot so many frames, and yet you get so few good ones.

Sunday, April 04, 2004

Random hacks: Language implementation speed benchmark

In my search for a new favorite programming language, one of the factors I favor is execution speed of programs. It is hardly the only factor, but one that happens to be measurable. I'm slowly setting up a benchmark suite to measure things that matter to me. The first benchmark counts frequencies of words:

Read text from the standard input and count the number of times each word occurs. Convert letters to lower case. Order the words according to frequency, words with the same frequency should be ordered in ascending lexicographic order according to character code. Print out the top N words, where N is a decimal number given on the command line. Each output line must contain the count, a space, and the word (in lower case), and end in an ASCII LINE FEED character. Output must contain exactly N such output lines and no other output lines.

A word contains only ASCII letters A through Z and a through z (convert upper case to lower case) and ASCII digits 0 through 9 and is not empty. All other characters separate words and are ignored except to notice word boundaries. Word boundaries only occur at the beginning and end of the file and at non-word characters. You may not assume a maximum length for the word, line, or input file.

For more information, see the README or download the tarball.

I have implementations in three languages so far: C, Python, and "shell" (using tr, sort, uniq, and tail). I've ran them with three versions of gcc and two versions of Python.

user time system time
gcc 2.95 54.8 0.8
gcc 3.2 55.3 0.8
gcc 3.3 54.4 0.9
Python 2.2 106.8 0.4
Python 2.3 103.6 1.6
shell 639.6 6.6

These numbers are just a teaser. They only show run times for a particular large input file. The programs have not gone through extensive optimization and can probably be improved a lot. I haven't even started experimenting with gcc options. I'm also missing all the languages I'm really interested in, since I haven't yet had time to learn them in order to write implementations of this benchmark.


Personal life: Smaller head

I made my head smaller today.

Friday, April 02, 2004

Quote: Kenny Tilton on Lisp

Kenny Tilton in comp.lang.lisp on March 31, 2004:

Lisp is no longer the crazy aunt in the attic, she is now out in the front parlor where her admirers come to pay respect and learn.

This is why I read comp.lang.lisp: the attitude and the humor.

Thursday, April 01, 2004

Random thought: Search keywords for my pages

One of the more fun features of webalizer is that it shows some of the search keywords with which people find my pages. The topmost entries are not particularly exciting: my most popular page ever is Linux Anecdotes, with Advocating Linux not far behind. Past the top ten, however, we get to the fun stuff.

In March, people have found my pages with searches such as "free sex" and "free sex new", "big sword big fight", "sex hack", and "better to travel hopefully". I trust that they weren't all by the same person.