The ultimate guide for UTF-8 in irssi and GNU/Screen

Mar 06 2007

I’ve been hav­ing quite a lot of trou­ble, lately, con­fig­ur­ing irssi to work well with UTF-8. Irssi’s doc­u­men­ta­tion was quite incom­plete, on the mat­ter, or dis­cour­ag­ing, and there wasn’t much on the Inter­net, so, after fig­ur­ing out what the way is, I’ll share it here.

First of all, you’ve got to make sure that your sys­tem is con­fig­ured for UTF-8 locales:

bash-3.1$ locale
LANG=en_GB.utf8
LANGUAGE=en_GB.utf8
LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8"
LC_TIME="en_GB.utf8"
LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8"
LC_MESSAGES="en_GB.utf8"
LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8"
LC_ADDRESS="en_GB.utf8"
LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8"
LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=en_GB.utf8

If the out­put of the locale doesn’t look like that, you want to recon­fig­ure your locales. On Debian, wha you have do is:

sudo dpkg-reconfigure locales

Here’s some scree­nies of what to expect:

dpkg-1.png
dpkg-2.png
dpkg-3.png

Generating locales (this might take a while)...
  en_GB.ISO-8859-1... done
  en_GB.ISO-8859-15... done
  en_GB.UTF-8... done
  en_US.ISO-8859-1... done
  en_US.ISO-8859-15... done
  en_US.UTF-8... done
Generation complete.

Per­fect, now that our sys­tem is con­fig­ured for UTF-8, we want to con­fig­ure our ter­mi­nal emu­la­tor. If you’re using xterm, you can invoke it with the -u8 switch, or just do uxterm, and that’s all that’s needed. If you’re using the gnome-terminal, go to the Ter­mi­nal menu, then choose Set Char­ac­ter Encod­ing and then UTF-8. If UTF-8 doesn’t appear in the list, you may want to try to logout and login again. While you’re at it, in the GDM login man­ager, go to the Lan­guage option and choose UTF-8 there too, so that it will be default.

Now let’s take care of GNU/Screen. In order to enable UTF-8, all you have to do is launch it with the -U switch:

screen -U -S irc

irc is just the name I want to assign to that screen ses­sion. Notice that if you want to switch a liv­ing screen ses­sion to UTF-8, you could do it for each win­dow, using the com­mand CTRL-a : utf8 on.

Once your GNU/Screen is con­fig­ured for UTF-8, you have to finally set up your irssi client. This was, for me, the tricky part, since the doc­u­men­ta­tion is a bit unclear, and I didn’t real­ize that my irssi wasn’t built with recode sup­port. To make sure that your irssi is, fire it up and give the command

/recode

If you get some­thing like

Target                         Character set

then every­thing is alright, oth­er­wise, if you get a No such com­mand error, you will have to rein­stall irssi with recode sup­port.

Irssi UTF-8 sup­port is made so that you are able to recode to dif­fer­ent charsets, depend­ing on the server or chan­nel you’re chat­ting in. First let’s set up some gen­eral options:

/set term_charset UTF-8
/set recode_autodetect_utf8 ON
/set recode_fallback UTF-8
/set recode ON
/set recode_out_default_charset UTF-8
/set recode_transliterate ON

These options will be the default, unless over­rid­den for spe­cific servers or chan­nels. What do they mean?

  • term_charset: this is the char­ac­ter set of your ter­mi­nal emulator
  • recode_autodetect_utf8: irssi will rec­og­nize UTF-8 input auto­mat­i­cally and treat it consequentially
  • recode_fallback: when we get some non-UTF-8 text from a chat peer, the text should be con­verted to this char­ac­ter set
  • recode: this enables the whole recode thing
  • recode_out_default_charset: this is very impor­tant: this is the default charset that you send out, unless dif­fer­ently spec­i­fied by a server/channel rule (we will see that shortly)
  • recode_transliterate: this enables translit­er­a­tion of the clos­est match: i.e. if some­one sends you a char­ac­ter that’s not in your charset, it will be translit­er­ate to the clos­est pos­si­ble one, or with a ques­tion mark, if none found

Now, you prob­a­bly need dif­fer­ent recodes on dif­fer­ent chan­nels, because you may speak dif­fer­ent lan­guages on dif­fer­ent chan­nels. For exam­ple, I send out UTF-8 when typ­ing on Eng­lish speak­ing chan­nels, and ISO-88591 or ISO-885915 when typ­ing on Finnish or Ital­ian speak­ing chan­nels, so peo­ple on the other end will always get my char­ac­ters right.

You need to add rules with the /recode command:

/recode add ircnet/foo ISO-8859-15
/recode add ircnet/bar ISO-8859-1
/recode add freenode/gee ISO-8859-1

Those com­mand will make you “speak” ISO-885915 on #foo on IRC­Net, and ISO-88591 on #bar and #gee in freen­ode. Every­where else you will “speak” UTF-8.

And this is what we get: here I’m typ­ing (er… I’m copy-pasting from Wikipedia) some text:

irssi.png

If you con­nect via SSH to a remote machine, where you run irssi inside screen, all you have to do is to set both sys­tems to use UTF-8, as explained in the begin­ning of this arti­cle, and then set the ter­mi­nal of the machine from which you SSH, to use UTF-8, as explained earlier.

Tags:

9 responses so far

  1. Thanks for the guide.

    Just a quick note:
    It should read “/recode add ircnet/foo ISO-885915″ instead of “/recode ircnet/foo ISO-885915

  2. Thanks!

  3. Great guide, it works great for me!

  4. UTF-8 in zsh->screen->irssi->ssh->Terminal.app…

    Großes Kino, wenn irgendwo in der Kette etwas nicht stimmt. Aber zumin­d­est soweit ich das sehen kann, tut’s mit fol­gen­den Ein­stel­lun­gen mit meinem 10.5.6-Client und einer Shell auf FreeBSD 7.1.

    In der Shell-Config (z.B. .zshrc):
    export LC_CTYPE=d…

  5. […] The ulti­mate guide for UTF-8 in irssi and GNU/Screen (tags: irc linux sysadmin) […]

  6. […] geholfen bei diesem Vorhaben hat mir fol­gende Seite, welche noch einige andere Tips bereithält: http://www.iovene.com/the-ultimate-guide-for-utf-8-in-irssi-and-gnuscreen/ Veröf­fentlicht von Chris­t­ian Abgelegt unter Linux Keine Kommentare […]

  7. ??? ??…

    irssi?? ??? charset ????…

  8. Why don’t you call irssi GNU/irssi? You could also, you know, add more empha­size on the GNU word, like mak­ing it blink.

  9. Because irssi is not a GNU project.

Leave a Reply