Dies ist eine alte Version des Dokuments!


Program Execution Details

Program Startup

Upon startup, the executable file is loaded into the main memory. There are multiple executable file formats available, e.g. Linux uses the Extensible Linking Format (ELF) while current versions of Microsoft Windows rely on the Portable Executable (PE) file format and MacOS X uses a format called Mach-O. As Linux is the primary target of this tutorial, the remaining startup description focuses on the execution of ELF files on Linux-based operating systems.

Knowing the address of the entry point _start of the executable from the ELF format, the operating system is able to start the execution. C developers might wonder why the execution entry point is called _start and not main(), which is what they are used to. Although the runtime environment C programs have is minimal in comparison to other languages, its setup is done within each application's code. Classical examples for runtime environment features of C are the access to command line arguments via argc and argv, as well as access to environment variables via envp. To save C programmers the effort of writing setup routines for these variables in every program by hand, compilers link ready-made code taking care of these tasks. This predefined code is called crt0 and fills the functional gap between the raw execution entry point _start and the C entry point main()1).

:!: Concluding from the reasoning above, command line arguments and environment variables are among the first values pushed to the stack of an application2). Due to that, in comparison to other variables in the control flow of the application their offsets are relatively easy to calculate or at least estimate. Keep this fact in mind, it will be important when calculating stack addresses and using these locations for exploitation.

Function Calls

Another important aspect of program execution is the way that functions are called. Parameter passing details depend on the applied calling convention. On x86 systems the cdecl calling convention3), which is used by default by GCC, requires the parameters to be put on the stack in reverse order4). When a call instruction is encountered, the address of the instruction executed directly after the function call is pushed to the stack. Execution is then continued with the code of the function.

Following, the steps of a function conforming to the cdecl calling convention on a Linux-based x86 operating system are demonstrated. Usually, inside the function the stack is prepared first. After saving the base pointer at the stack, it is overwritten with the current stack pointer.

push ebp
mov  ebp, esp

Remember that the stack grows from high memory addresses to low memory addresses. Allocating memory thus decreases the address of the top of the stack. By decreasing the value of the stack pointer ESP as shown below, n bytes of memory for local variables are allocated.

sub esp, n

At this point the stack setup is done and the actual content of the function is next to be executed. The base pointer can now be used to reference function parameters and local variables. While parameters have a positive offset, variables are referenced by a negative offset from EBP. After the function execution is finished the stack pointer and base pointer are restored to their original values.

mov esp, ebp
pop ebp

After these instructions, the data of the function is not inside the range of the stack anymore.

Lastly, the return address is read from the stack and written to the instruction pointer register EIP by executing the ret instruction. It is not possible to directly modify the instruction pointer register.

ret

Execution continues with the code at the saved address. According to the cdecl calling convention the return value is placed in the EAX register. It is then the task of the caller to clean up the stack and remove the passed parameters5).

The information on the stack belonging to a particular function invocation is called the stack frame of the function6). A visualization of the stack frame of a single function is shown below.

Keep in mind that function calls are nested in every non-trivial program. Thus there are several stack frames located on the stack.



← Back to memory types Overview Continue with buffer overflow basics →

2)
Jeff Duntemann (2009). Assembly Language Step by Step: Programming with Linux (3rd edition)
5)
Bruce Dang; Alexandre Gazet; Elias Bachaalany; Sébastien Josse (2014). Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
6)
Jon Erickson (2008). Hacking: The Art of Exploitation (2nd edition)