I’ve been having quite a lot of trouble, lately, configuring irssi to work well with UTF-8. Irssi’s documentation was quite incomplete, on the matter, or discouraging, and there wasn’t much on the Internet, so, after figuring out what the way is, I’ll share it here.

First of all, you’ve got to make sure that your system is configured for UTF-8 locales:

bash-3.1$ locale LANG=en_GB.utf8 LANGUAGE=en_GB.utf8 LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8" LC_TIME="en_GB.utf8" LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8" LC_MESSAGES="en_GB.utf8" LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8" LC_ADDRESS="en_GB.utf8" LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8" LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=en_GB.utf8

If the output of the locale doesn’t look like that, you want to reconfigure your locales. On Debian, wha you have do is:

sudo dpkg-reconfigure locales

Here’s some screenies of what to expect:

Generating locales (this might take a while)...  en_GB.ISO-8859-1... done
en_GB.ISO-8859-15... done en_GB.UTF-8... done en_US.ISO-8859-1... done
en_US.ISO-8859-15... done en_US.UTF-8... done Generation complete.

Perfect, now that our system is configured for UTF-8, we want to configure our terminal emulator. If you’re using xterm, you can invoke it with the -u8 switch, or just do uxterm, and that’s all that’s needed. If you’re using the gnome-terminal, go to the Terminal menu, then choose Set Character Encoding and then UTF-8. If UTF-8 doesn’t appear in the list, you may want to try to logout and login again. While you’re at it, in the GDM login manager, go to the Language option and choose UTF-8 there too, so that it will be default.

Now let’s take care of GNU/Screen. In order to enable UTF-8, all you have to do is launch it with the -U switch:

screen -U -S irc

irc is just the name I want to assign to that screen session. Notice that if you want to switch a living screen session to UTF-8, you could do it for each window, using the command CTRL-a : utf8 on.

Once your GNU/Screen is configured for UTF-8, you have to finally set up your irssi client. This was, for me, the tricky part, since the documentation is a bit unclear, and I didn’t realize that my irssi wasn’t built with recode support. To make sure that your irssi is, fire it up and give the command

/recode

If you get something like

Target                         Character set

then everything is alright, otherwise, if you get a No such command error, you will have to reinstall irssi with recode support.

Irssi UTF-8 support is made so that you are able to recode to different charsets, depending on the server or channel you’re chatting in. First let’s set up some general options:

/set term_charset UTF-8 /set recode_autodetect_utf8 ON /set recode_fallback
UTF-8 /set recode ON /set recode_out_default_charset UTF-8 /set
recode_transliterate ON

These options will be the default, unless overridden for specific servers or channels. What do they mean?

  • term_charset: this is the character set of your terminal emulator

  • recode_autodetect_utf8: irssi will recognize UTF-8 input automatically and treat it consequentially

  • recode_fallback: when we get some non-UTF-8 text from a chat peer, the text should be converted to this character set

  • recode: this enables the whole recode thing

  • recode_out_default_charset: this is very important: this is the default charset that you send out, unless differently specified by a server/channel rule (we will see that shortly)

  • recode_transliterate: this enables transliteration of the closest match: i.e. if someone sends you a character that’s not in your charset, it will be transliterate to the closest possible one, or with a question mark, if none found

Now, you probably need different recodes on different channels, because you may speak different languages on different channels. For example, I send out UTF-8 when typing on English speaking channels, and ISO-8859-1 or ISO-8859-15 when typing on Finnish or Italian speaking channels, so people on the other end will always get my characters right.

You need to add rules with the /recode command:

/recode add ircnet/foo ISO-8859-15 /recode add ircnet/bar ISO-8859-1
/recode add freenode/gee ISO-8859-1

Those command will make you “speak” ISO-8859-15 on #foo on IRCNet, and ISO-8859-1 on #bar and #gee in freenode. Everywhere else you will “speak” UTF-8.

And this is what we get: here I’m typing (er… I’m copy-pasting from Wikipedia) some text:

If you connect via SSH to a remote machine, where you run irssi inside screen, all you have to do is to set both systems to use UTF-8, as explained in the beginning of this article, and then set the terminal of the machine from which you SSH, to use UTF-8, as explained earlier.