Is your stacktrace really corrupted?

Oct 17 2006

You may encounter, dur­ing your debug­ging ses­sions, the ‘stack cor­rup­tion’ prob­lem. Usu­ally you will find it out after see­ing your pro­gram run into a seg­men­ta­tion fault. Oth­er­wise, it must mean that some very mali­cious and sub­tle code has been injected into your pro­gram, usu­ally through a buffer over­run. What is a buffer over­run? Let’s exam­ine the fol­low­ing short C code:

#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("This string definitely is too long, sorry!");
    foo();
    return 0;
}

There’s clearly some­thing wrong with it: as you can see, we are copy­ing ‘str’ to ‘buf’ with­out first check­ing the size of ‘str’. First of all there is a secu­rity issue, because if ‘str’ didn’t just come from a fixed string like in this case, but got inputted from some­where (maybe on a web­site), then there could be a string long enough to over­write the code of ‘foo’, and run mali­cious code on its behalf. What we have here, any­how, is just a seg­men­ta­tion fault. Let’s debug the program.


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) run
Starting program: /home/siovene/stack

Program received signal SIGSEGV, Segmentation fault.
0x6f6c206f in ?? ()
(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

Obvi­ously some­thing must have gone wrong. In order to bet­ter under­stand what is going on, let’s make a step back, and let’s exam­ine a work­ing exam­ple instead:


#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("abc");
    foo();
    return 0;
}

This is the same code, but it’s been stripped off of the long string that caused the seg­men­ta­tion fault, and in its place we find a harm­less 3 char­ac­ter string: ‘abc’. Let’s name the pro­gram stack.c anc com­pile it with debug informaion:


$> gcc -g -o stack stack.c

Now let’s debug it:


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048545 "abc") at stack.c:5
5         strcpy( buf, str );

We have entered the bar() func­tion, let’s exam­ine the backtrace:


(gdb) backtrace
#0  bar (str=0x8048545 "abc") at stack.c:5
#1  0x0804840e in main () at stack.c:13

What is the address of the bar() function?


(gdb) print bar
$1 = {void (char *)} 0x80483c4

Let’s now be para­noid and check this out pro­duc­ing a dump of our executable:


$> objdump -tD stack > stack.dis

Open the file with your favorite edi­tor and look for ‘80483c4′, the address of bar():


080483c4 <bar>:
 80483c4: 55                    push   %ebp
 80483c5: 89 e5                 mov    %esp,%ebp
 80483c7: 83 ec 28              sub    $0x28,%esp
 80483ca: 8b 45 08              mov    0x8(%ebp),%eax
 80483cd: 89 44 24 04           mov    %eax,0x4(%esp)
 80483d1: 8d 45 e8              lea    0xffffffe8(%ebp),%eax
 80483d4: 89 04 24              mov    %eax,(%esp)
 80483d7: e8 0c ff ff ff        call   80482e8
 80483dc: c9                    leave
 80483dd: c3                    ret

Per­fect, that’s our func­tion. But now let’s get curi­ous. Where’s the stack pointer in the CPU registers?


(gdb) info registers
eax            0x0      0
ecx            0xb7ed11b4       -1209200204
edx            0xbff04f60       -1074770080
ebx            0xb7ecfe9c       -1209205092
esp            0xbff04f10       0xbff04f10
ebp            0xbff04f38       0xbff04f38
esi            0xbff04fd4       -1074769964
edi            0xbff04fdc       -1074769956
eip            0x80483ca        0x80483ca
eflags         0x282    642
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

The ‘esp’ reg­is­ter, on the archi­tec­ture this arti­cle is writ­ten on, is the stack pointer. Its address is 0xbff04f10. Let’s exam­ine the mem­ory at that point:


(gdb) x/20xw 0xbff04f10
0xbff04f10:  0x00000000   0x08049638   0xbff04f28   0x080482b5
0xbff04f20:  0xb7ecfe90   0xbff04f34   0xbff04f48   0x0804843b
0xbff04f30:  0xbff04fdc   0xb7ecfe9c   0xbff04f48   0x0804840e
0xbff04f40:  0x08048545   0x08048480   0xbff04fa8   0xb7db3970
0xbff04f50:  0x00000001   0xbff04fd4   0xbff04fdc   0x00000000

With this com­mand we have told GDB to exam­ine 20 words in exadec­i­mal for­mat at the address 0xbff04f10. That’s because the value of the stack pointer is the address of the back-chain pointer to the pre­vi­ous stack frame. So address 0×00000000 is the address of the pre­vi­ous stack frame. But 0×00000000 is put in the stack frame in con­cur­rence of the pro­gram entry point, i.e. the main() func­tion. This agrees with the fact that we know bar() was called by main()!

Every­thing looks ok and in place, since the pro­gram works per­fectly we weren’t expect­ing any­thing dif­fer­ent. Let’s now do the same with the faulty pro­gram. At the moment of the seg­men­ta­tion fault, the back­trace looked like this:


(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

To see exactly what goes on, it would be bet­ter to debug it more carefully:


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048580
                    "This string definitely is too long, sorry!")
                  at stack.c:5
5         strcpy( buf, str );
(gdb) next
6       }
(gdb) next
0x6f6c206f in ?? ()
(gdb) next
Cannot find bounds of current function

Let’s then try to fol­low back the stack­trace, as we did previously:


(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7e9b970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

(gdb) info registers
eax            0xbfeed1e0       -1074867744
ecx            0xb7ea4c5f       -1209381793
edx            0x80485ab        134514091
ebx            0xb7fb7e9c       -1208254820
esp            0xbfeed200       0xbfeed200
ebp            0x6f742073       0x6f742073
esi            0xbfeed294       -1074867564
edi            0xbfeed29c       -1074867556
eip            0x6f6c206f       0x6f6c206f
eflags         0x246    582
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

(gdb) x/20xw 0xbfeed200
0xbfeed200:  0x202c676e   0x72726f73   0xbf002179   0xb7e9b970
0xbfeed210:  0x00000001   0xbfeed294   0xbfeed29c   0x00000000
0xbfeed220:  0xb7fb7e9c   0xb7fee540   0x08048480   0xbfeed268
0xbfeed230:  0xbfeed210   0xb7e9b932   0x00000000   0x00000000
0xbfeed240:  0x00000000   0xb7feeca0   0x00000001   0x08048300

(gdb) x/20xw 0x202c676e
0x202c676e:     Cannot access memory at address 0x202c676e

There’s only one expla­na­tion to that: the stack mem­ory has been over­writ­ten and now con­tains gib­ber­ish. We have been very unlucky with our exam­ple, but this gave us the tools to imag­ine another case. Let’s assume the stack got actu­ally cor­rupted not because it was over­writ­ten acci­den­tally, but because GDB was fail­ing to build it. In this case you are still able to nav­i­gate it back­wards. All you need to do it keep fol­low­ing the value of the stack frames, start­ing from the ‘esp’ reg­is­ter, until you reach 0×000000. Write all the addresses down, and then use ‘obj­dump’ to obtain the dis­as­sem­bly and sym­bols infor­ma­tion from the binary. All is left, now, is to check the names of the sym­bols match­ing the pinned up addresses.

If you can actu­ally do that, than you have suc­cess­fully recon­structed your stack­trace. It wasn’t really cor­rupted by a bug in your pro­gram, but sim­ply GDB missed to keep it up with it.

Tags: ,

5 responses so far

  1. I am not a GDB spe­cial­ist, but I do not under­stand why GDB can­not trace back %esp if we can do it …

    How­ever, great descrip­tion of how one can shoot him­self in the foot by over­writ­ing the stack.

    One can also find the sym­bols directly from within gdb

    info sym­bol

    Of course, it is only reli­able as long as we can trust gdb. On the other hand, it is the only way to get the sym­bol infor­ma­tion if the bug is in some shared library, because obj­dump can­not tell at which address the library was or will be loaded. This address is a ran­dom num­ber gen­er­ated by the dynamic linker (ld-linux.so) when the pro­gram is loaded into memory.

    In my case noth­ing worked, as the stack was heav­ily trashed by some­body :-)

    (gdb) info reg­is­ters esp
    esp 0xaf970d94 0xaf970d94
    (gdb) x/20xw 0xaf970d94
    0xaf970d94: 0xaf970dac 0x00000006 0x00000e16 0xa7ae8811
    0xaf970da4: 0xa7becff4 0xa7aada40 0xaf970ed8 0xa7ae9fb9
    0xaf970db4: 0x00000006 0xaf970e4c 0x00000000 0x0000000d
    0xaf970dc4: 0x00000023 0x0000002f 0x00000027 0x0000002d
    0xaf970dd4: 0x00000022 0x00000016 0x00000036 0x0000002b
    (gdb) info sym­bol 0xaf970dac
    No sym­bol matches 0xaf970dac.
    (gdb) dis­as­sem­ble 0xaf970dac
    No func­tion con­tains spec­i­fied address.
    (gdb)

  2. Very inter­est­ing con­tri­bu­tion, thank you!

  3. […] to debug a cor­rupted stack Is your stack­trace really cor­rupted? by Sal­va­tore Iovene on 17 Octo­ber 2006 — Posted […]

  4. Real cool arti­cle… thank you.

    I used your tips and found an improvement:

    You can actu­ally achieve this very fast by:

    print the cur­rent stack frame:
    x/20wx $esp

    print the pre­vi­ous stack frame:
    x/20wx *(int *)$ebp

    print the prepre­vi­ous stack frame:
    x/20wx **(int **)$ebp

    cool, eh?

    Address­ing with reg­is­ters turns out to be REALLY help­ful. I’m debug­ging a pro­gram with­out debug-symbols :-( .. and found e.g.:

    dis­play *(char *)($esp+4)

    to work real great!

  5. Cool arti­cle. You real help me :)
    Thank you.

Leave a Reply