Lock-free state pattern in C#

Summary: I describe a pattern for fast code, which is logically easy to understand, and can help avoid deadlocks in concurrent C# programs.

I like to avoid using the lock statement in C# for short bits of code that don’t block. This is partly based on an only loosely supported idea that Interlocked operations are faster for simple code. I often use the concept of a protected immutable state object as an inner class of my object. Then, my objects have only one mutable member variable if mutability seems necessary. To update the state, I use code like this:

State state = _state;
State old_state;
State new_state;
do {
old_state = state;
new_state = SomeFunction(old_state, arg1, arg2, arg3);
state = Interlocked.CompareExchange<State>(ref _state, new_state, old_state);
}
while( state != old_state);

The above code works logically as: read the state, compute the new state based on the old state, and then, if the state is still the same as it was when we did our computation update it. CompareExchange returns the value of _state when it is called, so by looking at the result we can see if we were successful, or if we need to try again.

I could encapsulate this pattern as:


class Mutable<StateT> {
protected StateT _state;
public StateT State { get { return _state; } }
//Use the delegate how_to_update to map the
//old state onto the new state and return the new state
public StateT Update(System.Converter<StateT, StateT> how_to_update) {
StateT state = _state;
StateT old_state;
StateT new_state;
do {
old_state = state;
new_state = how_to_update(old_state);
state = Interlocked.CompareExchange<StateT>(ref _state, new_state, old_state);
}
while( state != old_state);
return new_state;
}
}

Since delegates can (or at least once did) have performance issues, it doesn’t make a lot of sense to do the above for highly performance critical code, but of course, one should only obsess about that if benchmarking shows that a particular usage is causing problems.

The above statement may call into question the whole approach: why not just update _state in a lock? One reason is that the above code can never deadlock since how_to_update is not holding a lock when it is being called. Of course, you can do the same logic as CompareExchange with a lock, but at that point, why not just CompareExchange?

“No Reply” is lame

example of noreply email

I often get automatic email messages with the reply-to address set to some kind of “noreply”. This is lame. For instance, Facebook sends notifications using such a noreply reply-to address. To me, this seems like lazy programming. If someone sends me a message on facebook, if I reply to the notification I am clearly intending to reply to the message. Facebook should parse that response and send it on to the original sender.

The same is true of twitter direct messages.

Systems developers should make their notification channels (IM, Email, SMS) two-way and not just one-way conduits. Not doing so represents a failure of design.

Dromanova Update

Emusic seems to be fooling with the format they deliver to their downloader clients. It seemed to switch to an XML format, then go back to the obfuscated format. In any case, I updated Dromanova to handle both cases:

You can see the mercurial repository here, or download the script itself here.

Free Software and Digital Music

Recently I suffered a logorrheic episode about Free Software. In the mean time two things have changed:

  1. I have installed Rockbox on my iPod.
  2. Amazon has released a non-free GNU/Linux client for their MP3 store.

With Rockbox, I’m using free software when I use my iPod, but there are other benefits than ideological purity. Rockbox supports many players, so I can easily buy a new mp3 player and get the same features I’m used to with my current device. The main feature I like with Rockbox is that the device maintains its own database. To install music on it, I only need to copy the files. My Ipod appears as a normal USB disk. This means I can use rsync to automatically keep my iPod as a mirror copy of all my music. When I go to a new machine, I can use rsync to copy all my music off just as easily. In this way, my Rockbox-powered iPod becomes an exact mirror of my music collection. Since I’m using Emusic (and now Amazon) to buy mp3s, keeping backups is very important to me.

The second question is the acceptability of Amazon’s downloader program. It is non-free software, but I’ve allowed it. The reasoning is the following: it enables a single transaction. When that transaction is over, it gets out of the way. I don’t need to use the program ever again. If in the future, Amazon does something to violate my rights, I just uninstall the program and keep using all the old mp3s. I don’t really see this as materially different from using their web page (which has javascript on it and as such executes code on my computer).

As a practical matter, I will only buy mp3 albums from Amazon when it is cheaper than buying the CD used (including shipping).

Vegans, Monks and The Devotion to Free Software

For many years I have strived to use only Free Software. There is a certain conceptual purity in adherence to that rule which appeals to me. However, in that time, I’ve identified a few problems I haven’t satisfactorily solved. I’m writing this in the hopes of stimulating some discussion or comments that might give me some new insight.

Definition of Free Software

Free Software is software that grants four specific freedoms: freedom to run for any purpose, freedom to modify and study, freedom to make verbatim copies, and freedom to redistribute your improvements. Explaining these ideas to people can be difficult. Other than price, most people don’t understand why they should care about Free Software. Very few see any connection with Free Software and liberty. The cynic in me would say that actually very few are willing to make any meaningful sacrifice for liberty.

An argument for using Free Software is that proprietary software leaves you as a kind of tenant, never really owning the system that is governed by that software. You might have bought your computer, but the current and future performance and feature set is governed by the copyright holder, who only licenses you to use the software but not make any non-trivial changes. As we have seen, software is appearing everywhere: in your car, in your home electronics and appliances, even in your credit cards (those that are smart cards). In the proprietary picture, you have no right to control or change these systems. Meaningfully exercising your constitutional freedoms increasingly involves the use of software. These arguments were laid out by Richard Stallman, the founder of the Free Software Foundation, who anticipated these problems more than 20 years ago (FSF on Wikipedia, FSF philosophy).

The Goal of using only Free Software

Today, it is relatively easy to use (almost) exclusively Free Software. Ubuntu is a distribution of GNU/Linux that is easy to install and use and comes with almost all the utilities many users need, including an excellent web browser (Firefox) and an office suite (OpenOffice). Those using Free Software today aren’t making a big sacrifice. Generally, there is a minor inconvenience of dealing with minor incompatibilities with proprietary software users. I would argue that the benefits of Free Software, such as not having to pay, not having to deal with license keys or copyright violations, automatic management of all the software installed on the system, outweigh the difficulties for most people. For someone who can program computers, the benefits are even greater: you can add new features to your favorite programs and even share those improvements with others legally.

Is using only Free Software more like being a Vegan or a Monk? Wikipedia defines veganism as “a philosophy and lifestyle that seeks to exclude the use of animals for food, clothing, or any other purpose.“. One could make the analogy of Free Software purists as Vegans. Those that only care about zero price (so called Freedom 0), might be compared to vegetarians. Being a vegetarian might be a bit difficult, but being a Vegan can be a real challenge. You have to be really sure that no hidden ingredient in some food might be an animal product. There is some similarity with verifying that your software follows all the four freedoms. For instance, try explaining to your Mom why the program Pine is not okay, even though you can run it for zero cost and get the source code.

Being a Vegan is a challenge, but approximately one in 100 or so choose to do it. On the other hand, being a Monk (or Nun) is something virtually no one chooses (outside of Tibet where almost 1/3 of the population are monks).

The Siren Call of Conceptually Simple Rules

I should state right off the bat, that the idea of a simple criterion for what is wrong or right is appealing to me. It may be naive, but I would like to identify some set of rules or conditions that would qualify software as “Kosher” to use. I suppose it is likely that such a quest will not be successful without spiraling into absurdity. Even Veganism, which seems like a simple rule, could be argued to be harmful to the plants, and so Fruitarians go further and only eat ripe fruits. When it comes to diet, I myself follow a somewhat arbitrary simple rule: I don’t eat Mammals. I acknowledge that there is some absurdity here and that there must be many non-mammals that I would eat that perhaps I shouldn’t (on some basis) or some mammals I don’t each which perhaps I should (on some basis). But the rule is very simple, useful, and therefore, to me appealing.

In order to find a good set of rules and in order to avoid complete absurdity, one should start with the motivations for the rules. I want to maximize the utility of technology and maximize my personal liberty. These two goals conflict: when using proprietary software utility may be increased at the cost of liberty (I can’t exercise some subset of the four freedoms). As a programmer, access to source code and the right to modify it also increases utility. So for me, it’s tempting to limit myself to software that only grants the four freedoms. But let me discuss some problems with this I have identified.

Where is it hard to follow the pure Free Software Ideal?

Here are a few challenges for Free Software purists:

  • Cell phones
  • Game systems
  • Portable media players
  • Drivers/firmware for PC hardware (such as video or wireless cards)
  • Flash, which has become very common on the web, does not currently have a fully capable Free Software replacement (see swfdec and gnash). However, Adobe does provide a flash plugin for GNU/Linux at zero cost.
  • Social software with large user bases, such as Skype.

All of the above have software or firmware that is updated after the unit is sold to the end user. In addition we could be more strict and consider:

  • Home video: DVD players, cable boxes, television firmware
  • Car software (e.g. the console in my Prius has a touch screen which controls climate, audio, and shows fuel economy statistics)
  • Software running on web pages (javascript code or the flash code driving a web page which is executed by your browser).

Personally, I am willing to use a cellphone that runs proprietary software because I have found no free alternative to replace it. When Openmoko, a free mobile phone platform is ready, I’ll almost certainly switch to that. In the mean time, my Treo becomes a sort of thin-end-of-the-wedge for proprietary software. Since the phone is already non-free, I am willing to install non-free programs on it such as Google’s Map application for the Palm OS, or their Java Gmail client.

I’m willing to use audio, video and gaming devices such as DVD players, Slimdevice’s Squeezebox, original (non-iphone/ipod touch) iPods, and the Nintendo Wii. Unfortunately, due to the disaster that is DRM, many media devices are especially hell-bent on denying users their rights, but I only purchase devices that I feel do not curtail my rights. I own a ReplayTV PVR, which automatically skips commercials and whose manufacturer sued to the brink of extinction. The argument I’ve constructed for the “media exemption” is that media content is primarily expressive and not primarily functional, so as long as the system does not do non-trivial DRM (note the wiggle room), I allow it. I am currently not willing to use the iPhone or iPod touch, since these machines are too close to general purpose computers and I think present too much of a risk to my freedom (since Apple maintains totalitarian control over those devices). If I were a purist, I could replace the firmware of my iPod with some free firmware such as Rockbox.

For my computers, my rule is that any non-bios code that runs on my CPU must be Free Software. I am willing to load the non-free firmware to my Intel wireless card, but I am not willing to run non-free ATI drivers for my graphics card. The difference is subtle, but real. These days, even hardware can be described and emulated by software. The firmware for my wireless card, in this picture, is basically hardware that can be modified after it is sold. I could disallow this exemption, but then we’re left with an absurdity: if Intel had made burned the original firmware into a read-only memory on the card, it would be okay. I admit the firmware exemption is shaky. The same argument could be extended to a Dell computer: Microsoft could burn Windows onto a ROM in principle, so why not just run Vista (of course, practically that would never work since the volume of code is so large with an entire OS that it could never work securely for years without updates). The CPU rule means I can’t run Adobe’s flash. This is a minor inconvenience but the Free Software programs gnash and swfdec can both show Youtube videos, even if they can’t view all flash programs and pages. The policy here is very close, if not identical, to the Debian GNU/Linux policy for what goes into the main section of their software distribution (another convenience: software is okay to run as long as it is in Debian main). The bios exemption is due to the fact that BIOS software is quasi-firmware like: it is rarely if ever updated and stored in flash or ROM. If possible, I will use LinuxBIOS to replace my BIOS, but currently the motherboards supported are rather limited.

I don’t use Skype on my computer, there is no need. I can run Ekiga and call anyone using standard SIP compatible software or hardware. I may be willing to use Skype on some device other than my computer, such as a dedicated skype handset. In this case, it’s like using a telephone. Obviously there is some bizarre cognitive dissonance going on with this issue.

An interesting inversion comes on the web. On the computer, I allow non-free hardware (the designs for my CPU and motherboard are not available to me), but insist on the code running on the CPU to be Free. On the web, I insist on the virtual hardware (the Java/.Net/Flash virtual machine) to be Free but the software (the page or service) is allowed to be non-free: I don’t expect the four freedoms to apply to all the javascript that might be embedded in a web page. I have no real argument for this except practicality and simplicity. Web code is sandboxed, so it can’t really be used to deny me control of my own property. Instead, it is like a representative of a service that I don’t own. Secondly, disabling javascript would render a very large subset of the web unusable.

Conclusion

In the course of writing this, I’ve rediscovered that practicality motivates my usage of Free Software. Clearly I use a lot of non-free software, less than the vast majority of computer users but none-the-less I do use non-free software. However, like most people I prefer to have more freedom if given the choice, and like most people, some trade-offs I’m willing to make and others I’m not. That doesn’t mean that every choice is okay and equal, it just means we have to accept a little grey, and be willing to continue to wrestle with the issues and make choices. A guiding principle for me is to always try to gain freedom, and try hard to avoid backsliding into a state of less freedom. By elevating the previous idea above some idyllic notion of purity, one avoids inviting total ideological collapse due to minor violation of principle.

I should make it clear, almost all of the ideas here originated elsewhere. Most came from Richard Stallman, various participants of the Debian project (such as Bruce Perens), Code by Lawrence Lessig and countless others who have contributed to the debate over Free Software and Free Culture.

Finding Deadlocks in .Net code, or CSLint with Cecil

While developing Brunet, we have had several headaches finding and removing deadlocks from the code. We wanted a tool to help us find deadlocks before they cause a problem, and we found CSLint by Konstantin Knizhnik. CSLint works on .Net binaries (not just C# as the name may imply). It looks for cases where locks are acquired in ways that could result in deadlock (a cycle exists in the graph representing the locks). Unfortunately, CSLint used an old library for parsing .Net binaries which does not support .Net 2.0.

Over this past weekend I have ported CSLint to use Mono Cecil. You can find my changes in the cslint-cecil mercurial repository. You can find a zip file with a recent version here for the time being. In the future, I’ll probably set up a better page.

This code is licensed under the MIT/X11 type license. I welcome contributions. The code can be improved a lot. It would be nice to have more test cases to see if we are really generating a good list of candidate loops. Also, currently, the output is a bit confusing as the loops are reported multiple times. It would be nice to improve that as well.

Finding Bugs

The WOW project uses the Brunet P2P library developed in C# using Mono.

Recently, we noticed a memory leak on our nodes on planetlab. David Wolinsky identified exactly the bit of code causing the leak and reported a bug. The bug was fixed within a day and the SVN has the fix already.

Kudos to David for tracking down the bug (notice many people on the list said there was probably not really a bug). Kudos also to the Mono team for getting a fix in after the issue was clearly identified. This is an example of the power of Free Software. After finding the bug, David was able to build a version of mono that same day that had the bug fixed which we could deploy on planetlab. Try that with proprietary software.

4834: (Python) Programming for Electrical Engineers, Fall 2007

Hello All,

Yes: I will be teaching 4834 in the fall.
No: It won’t be C++, it will be in Python
Yes: There is a book: Python Scripting for Computational Science by H.O. Langtangen.

A syllabus will be forthcoming, but the idea of the course is for students to learn common programming tasks for electrical engineers: numerical algorithms, reading and writing binary and text data formats (including XML), concurrent programming, network/sockets programming, GUI programming, web/cgi programming. Examples, homeworks and projects will cover topics relevant to electrical engineers.

If you’re asking “why python”, read this argument for Python as a teaching language. My short answers would be something like: it’s not that different from Matlab, which is popular with engineers, it’s easy to focus on problem solving and not mechanics (compiling and linking for instance), it’s easy to link python to c/c++ code, and lastly, python is just a lot more fun than c++.

I hope it’s a huge class, because it’s going to be incredibly great.

cron, laptops and battery power

I have a few cron jobs that run on my laptop (such as using Mairix to index my mail). Some of these jobs are not critical but do take a lot of CPU. When I am on battery power, I don’t want to run them. What do I do?

I made a small script which I call ac_is_plugged.sh which contains:

#!/bin/sh
cat /proc/acpi/ac_adapter/AC/state | grep -q "on-line"

Then in my crontab I put entries like:

*/15 * * * * $HOME/bin/ac_is_plugged.sh && nice mairix 1> /dev/null 2> /dev/null

The above entry will run every 15 minutes, but abort if the AC adapter is not plugged in. Hopefully this will give me a little more battery power.

Saving Battery Power with Debian GNU/Linux

I run Debian GNU/Linux on all my computers including my laptop. One problem I have had is that battery power is not as long as I would like on my laptop. One reason for this is that I have not been using the laptop-mode-tools package on my laptop. Now I am, and here is what I needed to do.

  1. apt-get install laptop-mode-tools
  2. edit /etc/laptop-mode/laptop-mode.conf to make: CONTROL_CPU_FREQUENCY=1
  3. edit /etc/modules to include “speedstep-centrino” and “cpufreq-ondemand”, and use modprobe to go ahead and load both of these modules
  4. Restart acpid (which was already installed): /etc/init.d/acpid restart

After the above, I am all set. When I am on battery power the “cpufreq-ondemand” governor will turn down my CPU clockrate when I am not using the processor much (which is quite often for most of us). When demand goes up, the clockrate goes back to the maximum to get the job done as fast as possible.

In addition to controlling the CPU frequency, the disk buffers are increased so hopefully the hard disk can be spun down more frequently and stay off for a while. There are more tricks you can use (such as changing the syslog so it won’t sync on each write). If you are not using your wireless, it is a good idea to have that off. So far, the only way I seem to be able to control that is to unload the module on my IBM T42. Also, dimming the screen as much as possible does seem to make some difference.

I have yet to do serious testing, but previously I was lucky to get an hour of battery life (using wifi), now it seems I can around 1.5 hours with the above. Of course, one problem could be that my battery is two years old and no longer holds a good charge:

bash# grep capacity /proc/acpi/battery/BAT0/info
design capacity: 47520 mWh
last full capacity: 25950 mWh

I am optimistic that something like Gnome Power Manager will make all this transparant to the user.

Note that Gentoo’s Power Management Guide is a useful resource.

Making Beautiful Presentations

A very large part of research is communicating with others. As the geeky math sort of kid, I wrongly assumed that communication and writing was not something I needed to focus on. Like many researchers, I find doing the work much more enjoyable than communicating it. That’s why I’m happy when I find a new tool that makes communicating more fun. Such a tool is Beamer. If you like markup languages and beautifully typeset presentations (and who doesn’t?), you’ll love using Beamer and Latex.
If you are doing technical publishing and you are not using Latex, obviously that is your first major mistake, which you should remedy. The reasons for using Latex are well known: excellent separation of content and layout, superior output quality, unparalleled equation typesetting, syntax which reminds you of the hours of fun you have enjoyed programming.

Given that you are using Latex to write your documents, you probably also want that same power to produce your presentations, but how to do this? There are several packages for this, for instance Seminar or Prosper. Personally, I have used Prosper. It works okay, but there are some minor annoyances. I have been encouraged to try Beamer, but as I am hardheaded, I never looked closely. That all changed when I saw this presentation. What could produce such beautiful slides?

Naturally I feared that those slides were produced with some non-free software such as Apple’s Keynote software (because clearly powerpoint does not look so nice). Fortunately, I was in luck, they were produced with Beamer. The most striking aspect of the slides, in my opinion, is that the structure of the talk is not hidden from the audience. Of course many presenters give an outline of their talk, but in the case of beamer, the outline and navigation are all produced automatically from the structure of your latex document (sections and subsections). Additionally, this rich navigation is not stictly available in some particular viewer program, instead it is in standard pdf files which may be viewed on almost any semi-modern computer.

And all of this is available to you at the low low price of zero dollars. Of course, one does need to educate oneself on how to produce beamer slides, but anyone using a computer as a serious tool must become accustomed to learning new software periodically. I am planning to make all my future presentations using Beamer.

UF has a Jabber Server

Jabber is a standard for instant messaging with support from many free software program. Recently I wrote about how Google’s new Gtalk is based on Jabber.
I am pleased to learn that UF has a jabber server. If you have a gatorlink account, you are ready to go. I have set Gaim, my favorite multi-system IM client, to work with our Jabber server. Here’s how:

If you are using another client, you might want to check UF’s page describing their Ask a Librarian program (which is using Jabber).

Message me at pob@ufl.edu.

Performance of Programming Languages

Some of the research I do, particularly Networks/P2P research, requires developing simulation, modeling, or experimentation software. Over the years I have become a fan of a growing set of programming languages. In my experience, educating new programmers on how to program safely and efficiently in C/C++ takes time. As a result, I have been looking more and more towards languages with garbage collection such as Java, C# (Mono), Python, and Ruby. Given the current popularity of Ruby-on-rails, I thought it might be interesting to compare Ruby and other languages in terms of program running time.

After spending time with so called “scripting languages” (a term that I feel is not a terribly well defined or useful), I have found that I enjoy programming in languages like Ruby, Python, and Boo more than languages like C, C++, Java, or C# (though C# is getting better and better as they improve the language). My question is: how much performance do I loose by adopting one these laguages. To answer this question I took a look at the Language Shootout site.

One can always argue about benchmarks, but at least they give us some starting point. Comparing the above languages (except Boo) in terms of running times, we see:

The above is by no means scientific. I always excluded start-up time since I am more interested in long-running code (simulations are long running experiments). Clearly these languages are not all equal in performance. Comparing Ruby to C we see that Ruby is between 14 and 600 times slower (several tests are more than 100 times slower).

It is easy for such comments to spiral out of control and provoke flames from fans of various languages, however, it is good to at least have some (though noisy and inaccurate) means to compare these languages. There are many rebuttals to the above:

  1. Optimization: Clearly gcc has been optimized longer than all the other items in the above list, so we would expect it to perform well, but, dynamically JIT compiled languages could in principle be faster [due to having access to test data at compile-time (which is also run-time)]. That may be true, and I can’t wait for it to happen, but it looks as though it has not happened yet.
  2. Usability/Productivity/Library Availability: As stated above, programming in C or C++ is generally considered harder to do right than the other languages. So, while the program may run slower, more (usable) code will be written in the other languages due to productivity boosts. While this is true, in some cases, there is a certain minimum performance that must be reached. Given that constraint, I would like to see if I can use a nice language of my choice. In many cases, it appears that the difference between the slowest and fastest will be profound

So, given that Ruby is a nice language, how practical is it to use for performance-critical applications? It appears that cost of choosing Ruby over C or C++ is very great in terms of performance. Depending on the size of your problem or the size of your computer, Ruby may be a very bad choice.

Others have given a different answer to this question. For instance, in an article titled It’s boring to scale with Ruby on Rails, the author argues that scaling for Ruby is no problem, since the major cost is not hardware, but programmers. This may be true for some, but not all. For instance in the cases of individuals hosting their own site labor is fairly cheap, but hardware is relatively expensive. In the case of research, we are often interested in tackling problems that are bounded by our computational resources. In this case, using a better performing system translates into solving a bigger problem: i.e. one that may actually be interesting.

Finally, with Netmodeler, we have taken the approach of using C++ for the core library (which allows for efficiently storing large networks in memory and quickly running graph algorithms), but using SWIG to make Python wrappers, which allow us to script the use of Netmodeler to build specific tools or simulators. This allows us to get the performance of C++, but the ease of building a new simulation program in Python.

As always, one size does not fit all.

Become an Amazonian Turk

Amazon has launched a fascinating program called Amazon Mechanical Turk. The name is a reference to the first AI chess playing device called The Turk (a device which played amongst others, Ben Franklin, Charles Babbage, and Napoleon Bonaparte). [See the recent book about the Turk].

Clive Thompson discussed this in a 2002 article for Wired titled Slaves to Our Machines: welcome to your future as a PC plugin. I thought the article was very interesting, and at the time I was interested in XML-RPC [see my abadonware QuteXR]. I imagined writing web services for tasks that humans can easily do, but not computers (yet). This is exactly Amazon’s Turk.

Using Amazon’s Mechnanical Turk API you can pay others to solve tasks and integrate them into a software system. It seems that some other humans have to agree that you correctly solved the job before you get paid (which seems like a catch-22 to me). Hopefully Amazon will implement some fault-tolerance techniques to make it easy to be pretty to automatically dispatch the job N times and only accept and pay those that are in the majority.

I am really excited about this and look forward to seeing what springs up around it. For instance, in the near term, the jobs may be completed by humans, but some subset of the jobs could potentially be completed by algorithms. Thus, this could present a market where algorithms and humans compete. Hopefully, it will encourage innovation in AI.

Google Talk == Jabber

Google has a new IM system called Google Talk which uses the Jabber (XMPP Standard) Protocol.

I have been a long time fan of the XML-over-TCP based Jabber Instant Message Protocol. Unlike Email, IM is totally non-standard and every IM provider seems to use their own protocol (AIM, MSN, Yahoo, etc…). Jabber, on the other hand, is standardized by the IETF as XMPP.

I had some difficulty logging on with Gaim, the IM client I use on my GNU/Linux laptop, but someone on Slashdot figured it out. The trick is the use “gmail.com” as the server name, but in the connection options use “talk.google.com” as the connect server. I guess this is a libjabber bug.

Kudos to Google for using a standard protocol. My gmail ID is oscar.boykin (or if you are using another jabber system, you can message me at johnynek@jabber.org). You can use it to chat with me, but I don’t check my gmail, so please don’t send email to that gmail account (if you want me to read it that is).

I am a little disappointed that the Ajax wizards at Google have not released a browser based Google Talk client. I guess the difficult part is sending events from the server to the client. Also, Google Talk is not part of the Jabber network; users with Google Talk accounts cannot directly message users with other Jabber accounts. Google says it will change that some by federating with other systems. It would be better if they were connected to the standard Jabber network.

Netmodeler is Free Software

I am one of the lead developers of the Netmodeler C++ library. This is the library that I and many of my former group-mates at UCLA’s Complex Network Group have developed to make calculations and simulations for various works, including: spam fighting, more spam fighting, load balancing, search in unstructured p2p networks, disaster management in complex networks, and finding modules in protein networks.

The Netmodeler library is GPL licensed Free Software. Detailed instructions for obtaining Netmodeler are given on the Netmodeler wiki page.

We welcome others to use the code (as long as they obey the GPL), and we welcome source code contributions (particularly any autoconf/automake gurus that could help us get the build system improved).

Wiki

There is now a Local Wiki on boykin.acis.ufl.edu. I installed MediaWiki, which is used for Wikipedia.

I believe in being open with ideas, because they are not scarce. By collaborating we can improve the quality of the many ideas that we have to pursue. Technologies like Wikis make it easier to collaborate, and therefore more is produced.

I hope my new Wiki will be a huge hit with students and collaborators.

Mail migration

Over the weekend I was able to complete my mail migration from starsky.ee.ucla.edu to acis.ufl.edu. The main challenge was that starsky uses mbox folders (uw imap) while ACIS uses Maildir folders (cyrus). Maildir folders are superior, and the cyrus imap server is faster than the uw imap server. I am glad to move to Maildir, however, there was some pain in doing so.

My mail consists of over 1 GB of data going back to as early as 1995. The complete record only starts around 1999 however. In addition to physically moving the mail, I had to convert all my scripts to deal with Maildir folders rather than mbox. This includes my Procmail setup, along with several home-brew spam fighting scripts.

Along the way I found a few new tools:

  • Offlineimap: this is a great tool that allowed me to synchronize my mail on starsky with the mail at ACIS. I also use it to keep a local copy of my email on my laptop (where I read my mail with muttng).
  • Mairix: once I have a local copy of my mail, I want to be able to quickly search that 1 GB of data. The only way to do that quickly is to index it, and query the database. Fortunately, the mairix tool makes this trivially easy (depending on your definition of trivial). Mairix along with mutt is making me love email again.

Hopefully, ACIS will be the home of my email for many years to come so I won’t have to deal with this again for years to come. By then, hopefully there will be more powerful tools.

PDB to Ical conversion

For some reason, I cannot find any command line (scriptable) program that can convert palm datebooks into ical format. Given an ical file, I can display it using PHPICalendar.

As they say, if you want something done right, you have to do it yourself.

So, using Perl’s Palm database support package Palm::PDB I hacked up a simple script. This converts my datebook into an ical file which can then be shown via PHPICalendar. It does not support all the options yet, but it covered about 95% of what I need, so I am going to not worry about the rest for now.

Hopefully, this will be of some use to someone else. If anyone knows of a more complete command line tool, please let me know.

(Update 6/1/2006):
I have updated the script to add UID and thanks to Jeroen van Nieuwenhuizen, exceptions to repeating events are now respected. Please note, this script is very poor coding, however, it works for me. (Shamelessly stolen disclaimer): It comes with an almost Kafka-esque lack of warranty. It may steal your motorcycle, and drink all your milk. If you have any improvements or bug reports, please mail them to me. I will be happy to accept them.

Download the latest.

Update: 5/31/2007, I have created a repository with the latest version of the code (which includes some improvements and bugfixes from contributors). Find the repository for date2ics here

“Old” is the new “New”

“All mail clients suck.” Unfortunately, it seems we need to use email none-the-less. Over the weekend I spent some time thinking about my email setup. I used to use Mutt. Unfortunately, it seemed that using it with multiple IMAP accounts was a hassle, so I switched to Thunderbird when I moved to UF. But of course, I would not be long satisfied.

First, why don’t I like Thunderbird (and indeed most graphical email readers): two main problems, speed and vim.

Lots of people are GUI bigots. I used to be one years ago. The GUI bigot assumes a GUI will be faster and better at everything. In fact, GUIs are usually easier to learn, but often slower to use. For instance, saving an email message requires carefully clicking on the message, and carefully dragging it to one of many folders. In Mutt, I say “s folder ” and I have moved the message. There is a very low chance for errors, and I don’t take my hands off the keyboard.

The second problem is no Vim support. Many people never bother to learn how to use a powerful text editor. This is a real shame. Sure it takes a while to learn an editor like Emacs or Vi, but the fact that these programs have been used for the past 20 years should tell you something about how useful they are. Consider industrial tools: it takes an investment to learn how to use them properly and safely, but once that is done, the operator is *much* more productive. Considering that I am at a computer as much as 75-80% of my work week, I can’t afford to loose productivity for fear of learning a new tool. Learning Vi, and using Vim makes me more productive when editing text, which is about 90-100% of what I do with a computer (writing papers, emails, and programs).

Thus, I am back with Mutt since it offers me speed and the ability to use Vim. This time around, I am using a few new tools to make things easier.

  • Muttng: the main mutt developer has been a little slow accepting patches, thus in the greatest tradition of free software, Muttng has been created. A fork of the original Mutt, Muttng includes one very nice feature: a sidebar. This allows me to see which folders have new messages in them.
  • Offlineimap: this is a tool written in Python, which synchronizes IMAP mailboxes with each other, or local mail directories. Using this tool, I periodically (every 3 minutes) synchronize my various accounts with my local disk. This has two benefits: 1) once I download the message I can quickly access it, whether I am disconnected or not, and 2) I have a backup copy of all my mail in a standard format (Maildir). This is really a great tool, it is probably an excellent solution despite what reader you use to read the mail.
  • ESMTP: due to various anti-spam technologies it is getting harder to deliver mail from ones own computer. Increasingly one needs to send mail through well known smtp servers provided by ISPs. esmtp is a program which delivers mail through remote smtps, but works exactly like sendmail on a unix host. This allows existing software (like mutt) to easily interoperate.

Finally, how does one install all these goodies? With debian:

apt-get install esmtp offlineimap

Unfortunately, muttng is not in Debian, but you can get a deb, which may be installed using dpkg.

That’s about it! I am glad to be back in the text-only email world!