Lars Wirzenius: Random hacks, 2005


Sunday, December 18, 2005

Random hacks: deb-from-bzr

I wrote a little script to help me maintain packages using bzr (bazaar-ng). It builds source and binary packages and tests them with lintian, linda, and piuparts.

Wednesday, August 10, 2005

Random hacks: Xchat whiteout

I have long been reluctant to use /ignore on IRC. I do not have a moral problem with ignoring unpleasant people, but I do have a practical problem: it can be confusing to see only responses to the ignored person and it is not always clear from context that they are responses. To reduce this confusion, it would be better if what the ignored person writes is shown with the same foreground and background color. I'm told irssi can do this already, but I like xchat and I don't want to switch, so I wrote a whiteout plugin. I've only just started to use it, so it may well be buggy.

Friday, July 01, 2005

Random hacks: mini-fortune

I'm slowly collecting a fortune cookie file of my own. It is quite small, and I don't feel like setting up an index file with strfile, so I wrote a trivial mini-fortune file instead. It turned out to be quite simple:


import fileinput, random

def process_cookie(cookie, chosen, cookies):
    cookies += 1
    if random.randint(1, cookies) == 1:
        chosen = cookie
    return chosen, cookies

chosen = ""
cookies = 0
cookie = ""
for line in fileinput.input():
    if line == "%%\n" or (chosen and fileinput.isfirstline()):
        chosen, cookies = process_cookie(cookie, chosen, cookies)
        cookie = ""
        cookie += line
if cookie:
    chosen, cookies = process_cookie(cookie, chosen, cookies)
print chosen,

The algorithm is explained here: Perl Cookbook: Picking a Random Line from a File. It reads through the entire file, but since all the machines I expect to run this on are relatively fast, even a few hundred kilobytes of fortune cookies (and that's a lot of them) isn't significantly slow.

Wednesday, May 11, 2005

Random hacks: Encrypted laptop disk

I re-installed esme, my laptop, again. This was my third installation of Debian on it since I bought it just before Christmas. The reason for the re-installation this time was that I wanted an encrypted hard disk. Luckily, the cryptsetup and kernel packages in Debian made this really, really easy. Yay for their respective maintainers.

Here's a brief summary of what I did: first, I wiped out everything on the hard disk, and installed Debian from scratch. I made three partitions: a half-gigabyte one for /boot, a gigabyte one for swap space, and the rest for root. I then installed a small Debian into the swap partition, using the sarge RC3 netinst CD. Then I switched sources.list to use unstable, and installed cryptsetup and a 2.6.10 kernel (since certain parts of my laptop hardware require that), and rebooted. After this, I just followed the instructions in the CryptoRoot.HowTo file in the cryptsetup documentation directory to make the third partition encrypted and move over the installation to that. Finally, I converted the swap partition into an encrypted swap space (following instructions in CryptoSwap.HowTo).

So far everything works fine. The system does not even feel any slower than it was: as long as there is no heavy disk I/O, everything is snappy. Any heavy disk I/O kills interactivity, but that happened beforehand as well. I don't do much heavy disk stuff on my laptop anyway.

End result: hopefully my laptop is a tad more secure in case it gets stolen (when turned off or with the screensaver running at least). I do have yet another password to remember, though, which is annoying, but worth it.

Random hacks: Web log software update

A few days ago I made some changes to my web log software. I don't think anyone noticed, which is at it should be. I admit that it is easy to make mistakes when writing software that generates RSS files, and that this is why so many people accidentally flood web log aggregators such as Planet Debian when they update. It is, however, annoying and avoidable.

One of the things I made into my software a couple of years ago is a limit to how many entries it puts into the RSS feed. Since feeds are supposed to be polled fairly often, having only at most, say, four days' worth of entries seems reasonable. Thus, if the software makes a mistake, there won't be all that many entries flooded anyway. Damage control is good.

I also used my web log software as the first guinea pig for testing Bazaar, one of the implementations of the arch version control system. I may put it up for others to access one day, after I figure out the details for that. It's still pretty specific for my needs, though. I don't have any intention of starting to compete in this area.

Monday, March 28, 2005

Random hacks: Gave away SoundConverter

Some time ago I decided that I didn't have the time and will to hack on my SoundConverter application. It did what I wanted it to do, but people wanted MP3 encoding support, and maybe some other things as well. Gautier Portet offered to take over the program, and he has now released two new versions. See for the new versions.

I didn't manage to run version 0.7 on Debian yet, but version 0.6 works, and has MP3 encoding. After I do get it running on Debian again, I should probably offer to package it.

Sunday, March 27, 2005

Random hacks: RSS generation and ampersands

Jose Carlos Garcia Sogo and Mike Beattie write about ampersands in Jose's log breaking Planet Debian and its aggregated RSS feed. I didn't find the problematic RSS snippet anymore, but here's what I think the situation is: RSS (at least version 2, which I use, and I think the others as well) requires the HTML content to be entity escaped. In other words, if you want an ampersand in the final output, the HTML to create it must be &, and this must then be encoded in the RSS file as &.

You have to do the same escaping for the less-than and greater-than characters as well.

I struggled with this a couple of years ago when I wrote my own web log scripts. I wrote them because I wanted to have the web log pages integrated with the rest of my pages, and because I am a NIHolic, but in case they are of use for anyone, I put a tarball up. Note that they are likely to not work for you directly, but they might be helpful in looking at how RSS is generated.

For debugging RSS feeds, I found extremely useful. It doesn't respond to me now, but I hope that is temporary. It is a validator for RSS and Atom feeds, and validation is most helpful when you are unsure if your stuff is correct or not.

Thursday, February 17, 2005

Random hacks: SoundConverter - new developer?

My SoundConverter program has not seen any development for some time now. Particularly MP3 encoding support would be nice, plus various tweaks to the user interface to make it follow the GNOME Human Interface Guidelines better. I'm sure there are other things as well that could be improved. I don't seem to find interest in it, however, since the program works quite well enough for me and I am quite busy enough with other projects.

Would someone like to take over it? It is a simple thing, written in Python, using PyGTK, libglade, and GStreamer. Mail me, if you're interested.

Friday, January 21, 2005

Random hacks: Mail processing

I've now been using crm114 instead of bogofilter for a few weeks. On the whole, it seems to me that crm114 filters about as well as bogofilter, possibly a bit better, except that it has not yet learned that certain kinds of mails are not spam. Specifically, Debian bug reports and mails about DebConf5 are often classified as spam, though not always.

On the whole, then, I'm happy with crm114. It promises to not break frequently due to database on-disk format changes, the way bogofilter does. This was my main reason for switching.

While on the subject of mail processing, I thought I'd describe what I do. All my private (non-work) mail to various addresses is redirected to one place, since it is easier for me to deal with just one inbox rather than several. At this one place, my mail server, I run procmail. Over the years, my .procmailrc has variously been really complicated and really simple. I prefer simple, since things break less that way. The first procmail rule I have is one that makes a copy of all incoming mails in an archive folder. This is important: as long as this rule works, all incoming mail can be retrieved from the archive folder even if later processing breaks.

backups/mail/backup-`date +%Y-%m-%d`.maildir/

The way the rule is written, the archive folder is per-day. The next rule filters the mail through crm114:

:0fw: .msgid.lock
| /usr/share/crm114/mailfilter.crm -u $HOME/.crm114/

This rule makes procmail use the output of crm114 for the remaining rules. Spam is then put in a separate folder, which I occasionally check to correct any mistakes crm114 makes.

* ^X-CRM114-Status: SPAM.*

crm114, as packaged for Debian, was not quite as nicely to set up for use as bogofilter is. I ended up creating a directory ~/.crm114 and making symlinks there to /usr/share/doc/crm114/examples/crmfilter. needed to be copied so I could change it, of course.

Sometimes crm114 makes mistakes. When it thinks a valid mail is spam, the mail is put into spam.maildir, which I read with mutt. To teach crm114 I move the mail to not-spam.maildir and run a script that does the teaching. I have a macro in mutt to do the moving easily:

macro index S "<save-message>~/not-spam.maildir/\ny"

The crm114 re-education script:

find $HOME/not-spam.maildir/{cur,new} -type f |
while read mail
  /usr/share/crm114/mailfilter.crm -u $HOME/.crm114 --learnnonspam \
    < "$mail" > /dev/null
  mv "$mail" $HOME/Maildir/new

When crm114 misclassifies a spam as valid mail, it gets downloaded to my laptop by Evolution. In Evolution, I move the spam to a folder (called "is too spam" in Finnish), which a script then copies to the server and runs through crm114. Similarly for

#!/bin/sh -e


doit() {
    if [ -s "$base/$1" ]
        echo "$1..."
        formail < "$base/$1" \
                -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version \
                -s ssh /usr/share/crm114/mailfilter.crm \
                -u /home/liw/.crm114/ "$2" < /dev/null

doit "Onpas roskaa" --learnspam

On the whole, this setup works pretty well. It is not quite as smooth as using a filter well integrated to Evolution would be (and there is one), but I need the filtering done on my mail server, since the filter also protects the mailing lists I run.