The ultimate guide for UTF-8 in irssi and GNU/Screen

by Salvatore Iovene on 6 March 2007 — Posted in Howtos, Articles

I’ve been having quite a lot of trouble, lately, configuring irssi to work well with UTF-8. Irssi’s documentation was quite incomplete, on the matter, or discouraging, and there wasn’t much on the Internet, so, after figuring out what the way is, I’ll share it here.

First of all, you’ve got to make sure that your system is configured for UTF-8 locales:

bash-3.1$ locale
LANG=en_GB.utf8
LANGUAGE=en_GB.utf8
LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8"
LC_TIME="en_GB.utf8"
LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8"
LC_MESSAGES="en_GB.utf8"
LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8"
LC_ADDRESS="en_GB.utf8"
LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8"
LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=en_GB.utf8

If the output of the locale doesn’t look like that, you want to reconfigure your locales. On Debian, wha you have do is:

sudo dpkg-reconfigure locales

Here’s some screenies of what to expect:

dpkg-1.png
dpkg-2.png
dpkg-3.png

Generating locales (this might take a while)...
  en_GB.ISO-8859-1... done
  en_GB.ISO-8859-15... done
  en_GB.UTF-8... done
  en_US.ISO-8859-1... done
  en_US.ISO-8859-15... done
  en_US.UTF-8... done
Generation complete.

Perfect, now that our system is configured for UTF-8, we want to configure our terminal emulator. If you’re using xterm, you can invoke it with the -u8 switch, or just do uxterm, and that’s all that’s needed. If you’re using the gnome-terminal, go to the Terminal menu, then choose Set Character Encoding and then UTF-8. If UTF-8 doesn’t appear in the list, you may want to try to logout and login again. While you’re at it, in the GDM login manager, go to the Language option and choose UTF-8 there too, so that it will be default.

Now let’s take care of GNU/Screen. In order to enable UTF-8, all you have to do is launch it with the -U switch:

screen -U -S irc

irc is just the name I want to assign to that screen session. Notice that if you want to switch a living screen session to UTF-8, you could do it for each window, using the command CTRL-a : utf8 on.

Once your GNU/Screen is configured for UTF-8, you have to finally set up your irssi client. This was, for me, the tricky part, since the documentation is a bit unclear, and I didn’t realize that my irssi wasn’t built with recode support. To make sure that your irssi is, fire it up and give the command

/recode

If you get something like

Target                         Character set

then everything is alright, otherwise, if you get a No such command error, you will have to reinstall irssi with recode support.

Irssi UTF-8 support is made so that you are able to recode to different charsets, depending on the server or channel you’re chatting in. First let’s set up some general options:

/set term_charset UTF-8
/set recode_autodetect_utf8 ON
/set recode_fallback UTF-8
/set recode ON
/set recode_out_default_charset UTF-8
/set recode_transliterate ON

These options will be the default, unless overridden for specific servers or channels. What do they mean?

  • term_charset: this is the character set of your terminal emulator
  • recode_autodetect_utf8: irssi will recognize UTF-8 input automatically and treat it consequentially
  • recode_fallback: when we get some non-UTF-8 text from a chat peer, the text should be converted to this character set
  • recode: this enables the whole recode thing
  • recode_out_default_charset: this is very important: this is the default charset that you send out, unless differently specified by a server/channel rule (we will see that shortly)
  • recode_transliterate: this enables transliteration of the closest match: i.e. if someone sends you a character that’s not in your charset, it will be transliterate to the closest possible one, or with a question mark, if none found

Now, you probably need different recodes on different channels, because you may speak different languages on different channels. For example, I send out UTF-8 when typing on English speaking channels, and ISO-8859-1 or ISO-8859-15 when typing on Finnish or Italian speaking channels, so people on the other end will always get my characters right.

You need to add rules with the /recode command:

/recode add ircnet/foo ISO-8859-15
/recode add ircnet/bar ISO-8859-1
/recode add freenode/gee ISO-8859-1

Those command will make you “speak” ISO-8859-15 on #foo on IRCNet, and ISO-8859-1 on #bar and #gee in freenode. Everywhere else you will “speak” UTF-8.

And this is what we get: here I’m typing (er… I’m copy-pasting from Wikipedia) some text:

irssi.png

If you connect via SSH to a remote machine, where you run irssi inside screen, all you have to do is to set both systems to use UTF-8, as explained in the beginning of this article, and then set the terminal of the machine from which you SSH, to use UTF-8, as explained earlier.


Get new articles via email

Related posts:

Fixing NVIDIA driver after a xserver-xorg-core upgrade in Debian and Ubuntu

by Salvatore Iovene on 16 January 2007 — Posted in Howtos, Articles

Using Debian Testing or Unstable, or a frequently upgraded version of Ubuntu, when doing an apt-get update && apt-get upgrade often will install a slightly newer version of xserver-xorg-code, and this will break the NVIDIA proprietary drivers, if you, like me, prefer to install them using the official NVIDIA installer. When this happens, at your next reboot, or next time you start X, this will crash.

Follow this instructions and you won’t need to reinstall the NVIDIA driver from scratch each time. First of all, stop your login manager (gdm assumed here):

/etc/init.d/gdm stop

Then move to:

cd /usr/lib/xorg/modules/extensions

Normally it should look like this:

total 956K
1 root root  19K 2007-01-09 21:13 libdbe.so
1 root root  34K 2007-01-09 21:13 libdri.so
1 root root 145K 2007-01-09 21:13 libextmod.so
1 root root   18 2007-01-15 20:42 libglx.so->libglx.so.1.0.9742
1 root root 676K 2007-01-15 20:42 libglx.so.1.0.9742
1 root root  28K 2007-01-09 21:13 librecord.so
1 root root  38K 2007-01-09 21:13 libxtrap.so

Notice the symbolic link from libglx.so to libglx.so.1.0.9742. In your case, instead, the installation of a newer xserver-xorg-core overwrote the libglx.so with the normal one provided by the X Server. What you have to do is simply restore the previous situation. Remove the libglx.so file:

sudo rm libglx.so

And make the symbolic link again:

sudo ln -s libglx.so.1.0.9746 libglx.so

Of course the version number, in my case 1.0.9746 may be different in your case. Now you can simply start the gdm login manager again:

sudo /etc/init.d/gdm start

Everything should be working again.

Thanks to http://osrevolution.wordpress.com/ for this.


Get new articles via email

Related posts:

Is your stacktrace really corrupted?

by Salvatore Iovene on 17 October 2006 — Posted in Howtos, Coding, Articles

You may encounter, during your debugging sessions, the `stack corruption’ problem. Usually you will find it out after seeing your program run into a segmentation fault. Otherwise, it must mean that some very malicious and subtle code has been injected into your program, usually through a buffer overrun. What is a buffer overrun? Let’s examine the following short C code:


#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("This string definitely is too long, sorry!");
    foo();
    return 0;
}

There’s clearly something wrong with it: as you can see, we are copying `str’ to `buf’ without first checking the size of `str’. First of all there is a security issue, because if `str’ didn’t just come from a fixed string like in this case, but got inputted from somewhere (maybe on a website), then there could be a string long enough to overwrite the code of `foo’, and run malicious code on its behalf. What we have here, anyhow, is just a segmentation fault. Let’s debug the program.


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) run
Starting program: /home/siovene/stack

Program received signal SIGSEGV, Segmentation fault.
0x6f6c206f in ?? ()
(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

Obviously something must have gone wrong. In order to better understand what is going on, let’s make a step back, and let’s examine a working example instead:


#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("abc");
    foo();
    return 0;
}

This is the same code, but it’s been stripped off of the long string that caused the segmentation fault, and in its place we find a harmless 3 character string: `abc’. Let’s name the program stack.c anc compile it with debug informaion:


$> gcc -g -o stack stack.c

Now let’s debug it:


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048545 "abc") at stack.c:5
5         strcpy( buf, str );

We have entered the bar() function, let’s examine the backtrace:


(gdb) backtrace
#0  bar (str=0x8048545 "abc") at stack.c:5
#1  0x0804840e in main () at stack.c:13

What is the address of the bar() function?


(gdb) print bar
$1 = {void (char *)} 0x80483c4

Let’s now be paranoid and check this out producing a dump of our executable:


$> objdump -tD stack > stack.dis

Open the file with your favorite editor and look for `80483c4′, the address of bar():


080483c4 <bar>:
 80483c4: 55                    push   %ebp
 80483c5: 89 e5                 mov    %esp,%ebp
 80483c7: 83 ec 28              sub    $0x28,%esp
 80483ca: 8b 45 08              mov    0x8(%ebp),%eax
 80483cd: 89 44 24 04           mov    %eax,0x4(%esp)
 80483d1: 8d 45 e8              lea    0xffffffe8(%ebp),%eax
 80483d4: 89 04 24              mov    %eax,(%esp)
 80483d7: e8 0c ff ff ff        call   80482e8
 80483dc: c9                    leave
 80483dd: c3                    ret

Perfect, that’s our function. But now let’s get curious. Where’s the stack pointer in the CPU registers?


(gdb) info registers
eax            0x0      0
ecx            0xb7ed11b4       -1209200204
edx            0xbff04f60       -1074770080
ebx            0xb7ecfe9c       -1209205092
esp            0xbff04f10       0xbff04f10
ebp            0xbff04f38       0xbff04f38
esi            0xbff04fd4       -1074769964
edi            0xbff04fdc       -1074769956
eip            0x80483ca        0x80483ca
eflags         0x282    642
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

The `esp’ register, on the architecture this article is written on, is the stack pointer. Its address is 0xbff04f10. Let’s examine the memory at that point:


(gdb) x/20xw 0xbff04f10
0xbff04f10:  0x00000000   0x08049638   0xbff04f28   0x080482b5
0xbff04f20:  0xb7ecfe90   0xbff04f34   0xbff04f48   0x0804843b
0xbff04f30:  0xbff04fdc   0xb7ecfe9c   0xbff04f48   0x0804840e
0xbff04f40:  0x08048545   0x08048480   0xbff04fa8   0xb7db3970
0xbff04f50:  0x00000001   0xbff04fd4   0xbff04fdc   0x00000000

With this command we have told GDB to examine 20 words in exadecimal format at the address 0xbff04f10. That’s because the value of the stack pointer is the address of the back-chain pointer to the previous stack frame. So address 0×00000000 is the address of the previous stack frame. But 0×00000000 is put in the stack frame in concurrence of the program entry point, i.e. the main() function. This agrees with the fact that we know bar() was called by main()!

Everything looks ok and in place, since the program works perfectly we weren’t expecting anything different. Let’s now do the same with the faulty program. At the moment of the segmentation fault, the backtrace looked like this:


(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

To see exactly what goes on, it would be better to debug it more carefully:


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048580
                    "This string definitely is too long, sorry!")
                  at stack.c:5
5         strcpy( buf, str );
(gdb) next
6       }
(gdb) next
0x6f6c206f in ?? ()
(gdb) next
Cannot find bounds of current function

Let’s then try to follow back the stacktrace, as we did previously:


(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7e9b970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

(gdb) info registers
eax            0xbfeed1e0       -1074867744
ecx            0xb7ea4c5f       -1209381793
edx            0x80485ab        134514091
ebx            0xb7fb7e9c       -1208254820
esp            0xbfeed200       0xbfeed200
ebp            0x6f742073       0x6f742073
esi            0xbfeed294       -1074867564
edi            0xbfeed29c       -1074867556
eip            0x6f6c206f       0x6f6c206f
eflags         0x246    582
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

(gdb) x/20xw 0xbfeed200
0xbfeed200:  0x202c676e   0x72726f73   0xbf002179   0xb7e9b970
0xbfeed210:  0x00000001   0xbfeed294   0xbfeed29c   0x00000000
0xbfeed220:  0xb7fb7e9c   0xb7fee540   0x08048480   0xbfeed268
0xbfeed230:  0xbfeed210   0xb7e9b932   0x00000000   0x00000000
0xbfeed240:  0x00000000   0xb7feeca0   0x00000001   0x08048300

(gdb) x/20xw 0x202c676e
0x202c676e:     Cannot access memory at address 0x202c676e

There’s only one explanation to that: the stack memory has been overwritten and now contains gibberish. We have been very unlucky with our example, but this gave us the tools to imagine another case. Let’s assume the stack got actually corrupted not because it was overwritten accidentally, but because GDB was failing to build it. In this case you are still able to navigate it backwards. All you need to do it keep following the value of the stack frames, starting from the `esp’ register, until you reach 0×000000. Write all the addresses down, and then use `objdump’ to obtain the disassembly and symbols information from the binary. All is left, now, is to check the names of the symbols matching the pinned up addresses.

If you can actually do that, than you have successfully reconstructed your stacktrace. It wasn’t really corrupted by a bug in your program, but simply GDB missed to keep it up with it.


Get new articles via email

Related posts:

  • No related posts