Shellcode
Template:Short description Template:No footnotes Template:Redirect Template:Redirect
Shellcode is executable code intended to be used as a payload for exploiting a software vulnerability. The term includes shell because the attack originally described an attack that opens a command shell that the attacker can use to control the target machine, but any code that is injected to gain access that is otherwise not allowed can be called shellcode. For this reason, some consider the name shellcode to be inaccurate.<ref>Template:Cite book</ref>
An attack commonly injects data that consists of executable code into a process before or as it exploits a vulnerability to gain control. The program counter is set the shellcode entry point so that that the shellcode runs. Deploying shellcode is often accomplished by including the code in a file that a vulnerable process downloads and then loads into its memory.
Common wisdom dictates that to maximum effectiveness, a shellcode payload should be small.<ref name="anley_koziol_2007">Template:Cite book</ref> Machine code provides the flexibility needed to accomplish the goal. Shellcode authors leverage small opcodes to create compact shellcode.<ref>Template:Cite book</ref><ref>Template:Cite web</ref>
Types
- Local
A local shellcode attack allows an attacker to gain elevated access privilege on their computer. In some cases, exploiting a vulnerability can be achieved by causing an error such as buffer overflow. If successful, the shellcode enables access to the machine via the elevated privileges granted to the targeted process.
- Remote
A remote shellcode attack targets a process running on a remote machine Template:Endash on the same local area network, intranet, or on the internet. If successful, the shellcode provides access to the target machine across the network. The shellcode normally opens a TCP/IP socket connection to allow access to a shell on the target machine.
A remote shellcode attack can be categorized by its behavior. If the shellcode establishes the connection it is called a reverse shell, or a connect-back shellcode. On the other hand, if the attacker establishes the connection, the shellcode is called a bindshell because the shellcode binds to a certain port on the victim's machine. A bindshell random port skips the binding part and listens on a random port.Template:Efn A socket-reuse shellcode is an exploit that establishes a connection to the vulnerable process that is not closed before the shellcode runs so that the shellcode can re-use the connection to allow remote access. Socket re-using shellcode is more elaborate, since the shellcode needs to find out which connection to re-use and the machine may have many open connections.<ref>Template:Cite web</ref>
A firewall can detect outgoing connections made by connect-back shellcode as well as incoming connections made by bindshells, and therefore, offers some protection against an attack. Even if the system is vulnerable, a firewall can prevent the attacker from connecting to the shell created by the shellcode. One reason why socket re-using shellcode is used is that it does not create new connections and, therefore, is harder to detect and block.
- Download and execute
A download and execute shellcode attack downloads and executes malware on the target system. This type of shellcode does not spawn a shell, but rather instructs the machine to download a certain executable file from the network and execute it. Nowadays, it is commonly used in drive-by download attacks, where a victim visits a malicious webpage that in turn attempts to run such a download and execute shellcode in order to install software on the victim's machine.
A variation of this attack downloads and loads a library.<ref>Template:Cite web</ref><ref>Template:Cite web</ref> Advantages of this technique are that the code can be smaller, that it does not require the shellcode to spawn a new process on the target system, and that the shellcode does not need code to clean up the targeted process as this can be done by the library loaded into the process.
- Staged
When the amount of data that an attacker can inject into the target process is too limited to achieve the desired effect, it may be possible to deploy shellcode in stages that progressively provide more access. The first stage might do nothing more than download the second stage than then provides the desired access.
- Egg-hunt
An egg-hunt shellcode attack is a staged attack in which the attacker can inject shellcode into a process but does not know where in the process it is. A second-stage shellcode, generally smaller than the first, is injected into the process to search the process's address space for the first shellcode (the egg) and executes it.<ref>Template:Cite web</ref>
- Omelet
An omelet shellcode attack, similar to egg-hunt, looks for multiple small blocks of data (eggs) and combines them into a larger block (omelet) that is then executed. This is used when an attacker is limited on the size of injected code but can inject multiple.<ref>Template:Cite web</ref>
Encoding
Shellcode is often written in order to work around the restrictions on the data that a process will allow. General techniques include:
- Optimize for size
Optimize the code to decrease its size.
- Self-modifying code
Modify its own code before executing it to use byte values that are otherwise restricted.
- Encryption
To avoid intrusion detection, encode as self-decrypting or polymorphic.
- Character encoding
An attack that targets a browser might obfuscate shellcode in a JavaScript string using an expanded character encoding.<ref>Template:Cite web</ref> For example, on the IA-32 architecture, here's two unencoded no-operation instructions (used in a NOP slide):
90 NOP 90 NOP
As encoded:
- Percent encoded: <syntaxhighlight lang="text" class="" style="" inline="1">unescape("%u9090")</syntaxhighlight>
- Unicode literal: <syntaxhighlight lang="text" class="" style="" inline="1">\u9090</syntaxhighlight>
- HTML/XML character reference : <syntaxhighlight lang="text" class="" style="" inline="1">邐</syntaxhighlight> or <syntaxhighlight lang="text" class="" style="" inline="1">邐</syntaxhighlight>
- Null-free
Shellcode must be written without zero-value bytes when it is intended to be injected into a null-terminated string that is copied in the target process via the usual algorithm (i.e. strcpy) of ending the copy at the first zero byte Template:Endash called the null character in common character sets. If the shellcode contained a null, the copy would be truncated and not function properly. To produce null-free code from code that contains nulls, one can replace machine instructions that contain zeroes with instructions that don't. For example, on the IA-32 architecture the instruction to set register EAX to 1 contains zeroes as part of the literal (1 expands to 0x00000001).
B8 01000000 MOV EAX,1
The following instructions accomplish the same goal (EAX containing 1) without embedded zero bytes by first setting EAX to 0, then incrementing EAX to 1:
33C0 XOR EAX,EAX 40 INC EAX
- Template:AnchorText
An alphanumeric shellcode consists of only alphanumeric characters (0–9, A–Z and a–z).<ref name="Rix_2001">Template:Cite journal</ref><ref name="Obscou_2003">Template:Cite journal</ref> This type of encoding was created by hackers to obfuscate machine code inside what appears to be plain text. This can be useful to avoid detection of the code; to allow the code to pass through filters that scrub non-alphanumeric characters from strings.Template:Efn. A similar type of encoding is called printable code and uses all printable characters (alphanumeric plus symbols like !@#%^&*). A similarly restricted variant is ECHOable code not containing any characters which are not accepted by the ECHO command. It has been shown that it is possible to create shellcode that looks like normal text in English.<ref name="Mason-Small-Monrose-MacManus_2009">Template:Cite conference (10 pages)</ref> Writing such shellcode requires in-depth understanding of the instruction set architecture of the target machines. It has been demonstrated that it is possible to write alphanumeric code that is executable on more than one machine,<ref>Template:Cite web</ref> thereby constituting multi-architecture executable code.
A work-around was published by Rix in Phrack 57<ref name="Rix_2001"/> in which he shows that it is possible to turn any code into alphanumeric code. Often, self-modifying code is leveraged because it allows the code to have byte values that otherwise are not allowed by replacing coded values at runtime. A self-modifying decoder can be created that initially uses only allowed bytes. The main code of the shellcode is encoded, also only using bytes in the allowed range. When the output shellcode is run, the decoder modifies its code to use instructions it requires and then decodes the original shellcode. After decoding the shellcode, the decoder transfers control to it. It has been shown that it is possible to create arbitrarily complex shellcode that looks like normal English text.<ref name="Mason-Small-Monrose-MacManus_2009"/>
Modern software uses Unicode to support Internationalization and localization. Often, input ASCII text is converted to Unicode before processing. When an ASCII (Latin-1 in general) character is transformed to UTF-16 (16-bit Unicode), a zero byte is inserted after each byte (character) of the original text. Obscou proved in Phrack 61<ref name="Obscou_2003"/> that it is possible to write shellcode that can run successfully after this transformation. Programs that can automatically encode any shellcode into alphanumeric UTF-16-proof shellcode exist, based on the same principle of a small self-modifying decoder that decodes the original shellcode.
Compatibility
Generally, shellcode is deployed as machine code since it affords relatively unprotected access to the target process. Since machine code is compatible within a relatively narrow computing context (processor, operating system and so on), a shellcode fragment has limited compatibility. Also, since a shellcode attack tends to work best when the code is small and targeting multiple exploits increases the size, typically the code targets only one exploit. None the less, a single shellcode fragment can work for multiple contexts and exploits.<ref name="Eugene_2001">Template:Cite web</ref><ref name="Nemo_2005">Template:Cite web</ref><ref name="Cha-Pak-Brumley-Lipton_2010">Template:Cite conference [1] (12 pages) (See also: [2])</ref> Versatility can be achieved by creating a single fragment that contains an implementation for multiple contexts. Common code branches to the implementation for the runtime context.
Analysis
As shellcode is generally not executable on its own, in order to study what it does, it is typically loaded into a special process. A common technique is to write a small C program that contains the shellcode as data (i.e. in a byte buffer), and transfers control to the instructions encoded in the data function pointer or inline assembly code). Another technique is to use an online tool, such as Template:Mono, to embed the shellcode into a pre-made executable husk which can then be analyzed in a standard debugger. Specialized shellcode analysis tools also exist, such as the iDefense sclog project (originally released in 2005 in the Malcode Analyst Pack). Sclog is designed to load external shellcode files and execute them within an API logging framework. Emulation-based shellcode analysis tools also exist such as the Template:Mono application which is part of the cross-platform libemu package. Another emulation-based shellcode analysis tool, built around the libemu library, is Template:Mono which includes a basic debug shell and integrated reporting features.
See also
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
Notes
References
External links
- Shell-Storm Database of shellcodes Multi-Platform.
- An introduction to buffer overflows and shellcode
- The Basics of Shellcoding (PDF) An overview of x86 shellcoding by Angelo Rosiello
- An introduction to shellcode development
- Contains x86 and non-x86 shellcode samples and an online interface for automatic shellcode generation and encoding, from the Metasploit Project
- a shellcode archive, sorted by Operating system.
- Microsoft Windows and Linux shellcode design tutorial going from basic to advanced.
- Windows and Linux shellcode tutorial containing step by step examples.
- Template:Usurped
- ALPHA3 A shellcode encoder that can turn any shellcode into both Unicode and ASCII, uppercase and mixedcase, alphanumeric shellcode.
- Writing Small shellcode by Dafydd Stuttard A whitepaper explaining how to make shellcode as small as possible by optimizing both the design and implementation.
- Writing IA32 Restricted Instruction Set Shellcode Decoder Loops by SkyLined Template:Webarchive A whitepaper explaining how to create shellcode when the bytes allowed in the shellcode are very restricted.
- BETA3 A tool that can encode and decode shellcode using a variety of encodings commonly used in exploits.
- Shellcode 2 Exe - Online converter to embed shellcode in exe husk
- Sclog - Updated build of the iDefense sclog shellcode analysis tool (Windows)
- Libemu - emulation based shellcode analysis library (*nix/Cygwin)
- Scdbg - shellcode debugger built around libemu emulation library (*nix/Windows)