Though shellcodes can do almost anything, they're ususally aimed at spawning a (possibly privileged) shell on the target machine (that's where the name shellcode comes from...).
The easiest and fastest way to execute complex tasks in assembler is using system calls (or syscalls, as their friends call them). System calls constitute the interface between user mode and kernel mode; in other words, system calls are the means by which userland applications obtain system services from the kernel, such as managing the filesystem, starting new processes, accessing devices, etc.
Syscalls are defined in the /usr/src/linux/include/asm-i386/unistd.h file, and each is paired with a number:
#ifndef _ASM_I386_UNISTD_H_ #define _ASM_I386_UNISTD_H_ /* * This file contains the system call numbers */ #define __NR_exit 1 #define __NR_fork 2 #define __NR_read 3 #define __NR_write 4 #define __NR_open 5 #define __NR_close 6 #define __NR_waitpid 7 #define __NR_creat 8 [...]
There are normally two ways to execute a syscall:
The first method is much more portable, since it is based on system calls defined in the kernel code and, therefore, common to all Linux distributions. The second method, which uses the addresses of the C functions, instead, is hardly portable among different distributions, if not among different releases of the same distribution.
Let's take a look at the first method. When the CPU receives a 0x80 interrupt, it enters kernel mode and executes the requested function, getting the appropriate handler through the Interrupt Descriptor Table.
The syscall number must be specified in EAX, which will eventually contain the return value. The function arguments (up to six), instead, are passed in the EBX, ECX, EDX, ESI, EDI and EBP registers (exactly in this order and using only the necessary registers). If the function requires more than six arguments, you need to put them in a structure and store the pointer to the first argument in EBX. Note: Linux kernels prior to 2.4 didn't use the EBP register for passing arguments and, therefore, could pass only up to 5 arguments using registers.
After the syscall number and the parameters have been stored in the appropriate registers, the 0x80 interrupt is executed: the CPU enters kernel mode, executes the system call and returns the control to the user process.
To recap, to execute a system call, you need to:
Now let's take a look at the most classic example: the _exit(2) syscall. We know from the /usr/src/linux/include/asm-i386/unistd.h file (see above) that it is number 1. The man page tells us that it requires only one parameter (status):
_EXIT(2) Linux Programmer's Manual _EXIT(2) NAME _exit, _Exit - terminate the current process SYNOPSIS #include <unistd.h> void _exit(int status) [...]
which we will store in the EBX register. Therefore, the instructions for executing this syscall are:
As we've stated before, a system call can also be executed by the means of a C function. So let's take a look at how to achieve the same results as above using a simple C program:
We only have to compile it:
$ gcc -o exit exit.c
and disassemble it with gdb to make sure it executes the system call and see how it works under the hood:
$ gdb ./exit GNU gdb 6.1-debian Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) break main Breakpoint 1 at 0x804836a (gdb) run Starting program: /ramdisk/var/tmp/exit Breakpoint 1, 0x0804836a in main () (gdb) disas main Dump of assembler code for function main: 0x08048364 <main+0>: push %ebp 0x08048365 <main+1>: mov %esp,%ebp 0x08048367 <main+3>: sub $0x8,%esp 0x0804836a <main+6>: and $0xfffffff0,%esp 0x0804836d <main+9>: mov $0x0,%eax 0x08048372 <main+14>: sub %eax,%esp 0x08048374 <main+16>: movl $0x0,(%esp) 0x0804837b <main+23>: call 0x8048284 <exit> End of assembler dump. (gdb)
The last instruction in main() is the call to the exit(3) function. We will now see that exit(3), in turn, calls the _exit(2) function which will finally execute the system call, including the 0x80 interrupt:
(gdb) disas exit Dump of assembler code for function exit: [...] 0x40052aed <exit+141>: mov 0x8(%ebp),%eax 0x40052af0 <exit+144>: mov %eax,(%esp) 0x40052af3 <exit+147>: call 0x400ced9c <_exit> [...] End of assembler dump. (gdb) disas _exit Dump of assembler code for function _exit: 0x400ced9c <_exit+0>: mov 0x4(%esp),%ebx 0x400ceda0 <_exit+4>: mov $0xfc,%eax 0x400ceda5 <_exit+9>: int $0x80 0x400ceda7 <_exit+11>: mov $0x1,%eax 0x400cedac <_exit+16>: int $0x80 0x400cedae <_exit+18>: hlt 0x400cedaf <_exit+19>: nop End of assembler dump. (gdb)
Therefore, a shellcode using the libc to indirectly execute the _exit(2) system call looks like:
push dword 0 ; status call 0x8048284 ; Call the libc exit() function (address obtained ; from the above disassembly) add esp, 4 ; Clean up the stack