Writing shellcode for Linux and *BSD

Though shellcodes can do almost anything, they're ususally aimed at spawning a (possibly privileged) shell on the target machine (that's where the name shellcode comes from...).

The easiest and fastest way to execute complex tasks in assembler is using system calls (or syscalls, as their friends call them). System calls constitute the interface between user mode and kernel mode; in other words, system calls are the means by which userland applications obtain system services from the kernel, such as managing the filesystem, starting new processes, accessing devices, etc.

Syscalls are defined in the /usr/src/linux/include/asm-i386/unistd.h file, and each is paired with a number:

/usr/src/linux/include/asm-i386/unistd.h

#ifndef _ASM_I386_UNISTD_H_
#define _ASM_I386_UNISTD_H_

/*
 * This file contains the system call numbers
 */

#define __NR_exit                       1
#define __NR_fork                       2
#define __NR_read                       3
#define __NR_write                      4
#define __NR_open                       5
#define __NR_close                      6
#define __NR_waitpid                    7
#define __NR_creat                      8
[...]

The first method is much more portable, since it is based on system calls defined in the kernel code and, therefore, common to all Linux distributions. The second method, which uses the addresses of the C functions, instead, is hardly portable among different distributions, if not among different releases of the same distribution.

2.1 int 0x80

Let's take a look at the first method. When the CPU receives a 0x80 interrupt, it enters kernel mode and executes the requested function, getting the appropriate handler through the Interrupt Descriptor Table.

The syscall number must be specified in EAX, which will eventually contain the return value. The function arguments (up to six), instead, are passed in the EBX, ECX, EDX, ESI, EDI and EBP registers (exactly in this order and using only the necessary registers). If the function requires more than six arguments, you need to put them in a structure and store the pointer to the first argument in EBX. Note: Linux kernels prior to 2.4 didn't use the EBP register for passing arguments and, therefore, could pass only up to 5 arguments using registers.

After the syscall number and the parameters have been stored in the appropriate registers, the 0x80 interrupt is executed: the CPU enters kernel mode, executes the system call and returns the control to the user process.

Now let's take a look at the most classic example: the _exit(2) syscall. We know from the /usr/src/linux/include/asm-i386/unistd.h file (see above) that it is number 1. The man page tells us that it requires only one parameter (status):

man 2 _exit

_EXIT(2)        Linux Programmer's Manual               _EXIT(2)

NAME

        _exit, _Exit - terminate the current process

SYNOPSIS
        #include <unistd.h>

        void _exit(int status)
[...]

which we will store in the EBX register. Therefore, the instructions for executing this syscall are:

exit.asm

mov eax, 1      ; Number of the _exit(2) syscall
mov ebx, 0      ; status
int 0x80        ; Interrupt 0x80

2.2 libc

As we've stated before, a system call can also be executed by the means of a C function. So let's take a look at how to achieve the same results as above using a simple C program:

exit.c

main () {
        exit(0);
}

$ gcc -o exit exit.c

and disassemble it with gdb to make sure it executes the system call and see how it works under the hood:

$ gdb ./exit
GNU gdb 6.1-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) break main
Breakpoint 1 at 0x804836a
(gdb) run
Starting program: /ramdisk/var/tmp/exit 

Breakpoint 1, 0x0804836a in main ()
(gdb) disas main
Dump of assembler code for function main:
0x08048364 <main+0>:    push   %ebp
0x08048365 <main+1>:    mov    %esp,%ebp
0x08048367 <main+3>:    sub    $0x8,%esp
0x0804836a <main+6>:    and    $0xfffffff0,%esp
0x0804836d <main+9>:    mov    $0x0,%eax
0x08048372 <main+14>:   sub    %eax,%esp
0x08048374 <main+16>:   movl   $0x0,(%esp)
0x0804837b <main+23>:   call   0x8048284 <exit>
End of assembler dump.
(gdb)

The last instruction in main() is the call to the exit(3) function. We will now see that exit(3), in turn, calls the _exit(2) function which will finally execute the system call, including the 0x80 interrupt:

(gdb) disas exit
Dump of assembler code for function exit:
[...]
0x40052aed <exit+141>:  mov    0x8(%ebp),%eax
0x40052af0 <exit+144>:  mov    %eax,(%esp)
0x40052af3 <exit+147>:  call   0x400ced9c <_exit>
[...]
End of assembler dump.
(gdb) disas _exit
Dump of assembler code for function _exit:
0x400ced9c <_exit+0>:   mov    0x4(%esp),%ebx
0x400ceda0 <_exit+4>:   mov    $0xfc,%eax
0x400ceda5 <_exit+9>:   int    $0x80
0x400ceda7 <_exit+11>:  mov    $0x1,%eax
0x400cedac <_exit+16>:  int    $0x80
0x400cedae <_exit+18>:  hlt    
0x400cedaf <_exit+19>:  nop    
End of assembler dump.
(gdb)

Therefore, a shellcode using the libc to indirectly execute the _exit(2) system call looks like:

push    dword 0        ; status
call    0x8048284      ; Call the libc exit() function (address obtained
                       ;   from the above disassembly)
add     esp, 4         ; Clean up the stack