kernelthread.com

A Taste of Computer Security

© Amit Singh. All Rights Reserved. Written in June 2004


Defeating Memory

It is common to have a situation where a user legitimately needs to perform an operation that requires more privileges than the user has. We discussed the setuid mechanism as a solution to this problem. Consider the ping command. It crafts and sends the ICMP protocol's ECHO_REQUEST datagram and receives responses. ping needs to send raw IP packets, something that only the super-user is allowed to do on a typical Unix system.

On such systems, ping is setuid root: when run by a normal user, it runs with super-user privileges. A good implementation would ensure that ping only has these "elevated" (or escalated) privileges exactly for the duration that they are absolutely required. On a system that supports fine-grained capabilities, ping would only have the ability to send raw packets, and would not be able to do anything else as a super-user.

In general, programs that can be launched by a normal user, but run with elevated privileges for part or all of the time, are perhaps the most attractive portals to defeating system security. In this context, the specific resource attacked is almost always memory.

Memory-based attacks on an application try to exploit flaws in its (or the runtime's) implementation — either programming errors or oversights. Typically, such attacks attempt to alter the (runtime) memory of an application so as to make the application do something it is not meant to — in the window of time when the application is running with elevated privileges.

Note that many setuid applications may not drop privileges irrevocably and may simply run with elevated privileges throughout. Some applications, even if they do drop privileges, might do so in a manner which allows for an explicit re-elevation.

Your Buffers Runneth Over

Unless otherwise stated, the language of implementation (of an application or a runtime) is assumed to be C (and relatives).

A widely used form of such vulnerability is a buffer overflow, particularly when the "buffer" (a chunk of contiguous memory) resides on the stack. Buffer overflows could be remotely exploited if they are in an application that is accessible over the network (such as a network daemon). Indeed, the Morris worm used a buffer overflow in the finger daemon as one of its infiltration mechanisms.

Exploiting an application using a buffer overflow attack requires crossing two primary technical hurdles: 1. to have the desired code be present in the address space of the application, and 2. to make the application execute this code somehow, possibly with appropriate parameters.

The "desired code" could be arbitrary payload code, but most often is code for executing a command shell (using an exec family function, say). Popularly known as shellcode, it is a stream of bytes that actually constitute executable code, such as machine instructions. This "bytestream" is introduced into the victim program's address space using a data buffer. In some variants of such attacks, no code is injected, but existing (legitimate) code in the program's address space is misused.

Memory attacks can target one or more of several data structures involved in a program's execution.

Stack

A vulnerable stack buffer is declared on the stack (an automatic variable in C parlance), and is used with an "unsafe" function (such as fscanf, gets, getwd, realpath, scanf, sprintf, strcat, or strcpy). The function copies data into this buffer without taking into account the size of the buffer. Since the program will blindly copy an arbitrary amount of data into this buffer, it is possible to overrun, or overflow the buffer with carefully constructed data.

Now, in addition to this buffer, there are other critical entities resident on the stack, in particular, the return address that the program will "jump" to after the current function finishes. This return address can therefore be overwritten using such a buffer overflow. The new return address would point to a piece of code of the attacker's choosing: perhaps something he injected into the program courtesy the same overflow.

Vulnerable Control Channel

The technical vulnerability here is the presence of control information (return address for a subroutine) in addressable, overwritable data.

Generically speaking, such a problem arises when multiple information channels using the same physical medium differ in their privileges or criticality (with respect to the working of the system they are a part of), but are accessible to a user. If the user can modify information in a control channel, he could make the system behave in a way it was not intended to. A real-life example is that of the older telephone system, with the "data" (voice or data) and control (tones) channels being on the same "line", allowing an attacker (a phone phreaker) to control the line, say, by whistling the right tones into it.

Each platform might have its nuances with respect to buffer overflows. On the PowerPC, the link register (LR) is used to store the return address of a subroutine call invoked by the bl (branch and link) instruction. It is not possible to address this register through memory, but the current value of LR is saved on the stack if the current function calls another function.

Good Shellcode

Desirable properties of shellcode include small size, being self-contained, and having no NULL bytes. A NULL byte terminates a C string, so a function accepting a string parameter would not look beyond the NUL character. Further, it is preferable to have shellcode that only consists of printable ASCII characters, a feature that could be useful in avoiding detection (by an intrusion detection system, say).

There are ways to accommodate NULL bytes, depending on the architecture. For example, it might be possible to get rid of NULL bytes by using a XOR operation (a value XOR'ed with itself yields a zero).

On the PowerPC, the opcode for the sc (system call) instruction is 0x44000002, and thus contains NULL bytes. The second and third bytes of this opcode are reserved (unused), and the opcode "works" even if they were replaced with a nonzero value. In any case, there is no other PowerPC instruction with prefix 0x44 and suffix 0x02. The same holds for the NOP (no operation) instruction, whose opcode is 0x60000000.

Formalizing Shellcode Writing

Developing shellcode from scratch could be a slow, trial-and-error process. However, once developed (by whoever), a piece of shellcode often sees a tremendous amount of re-use. Note that those on the side of security benefit by understanding and experimenting with the techniques used by attackers. Consequently, you may have a legitimate non-malicious use for shellcode (to test an intrusion detection system, say).

Several tools and libraries exist for making it easier to experiment with shellcode. A good example is the Metasploit Framework: a complete environment for writing, testing, and using exploit code. It is advertised by its creators as a platform for penetration testing, shellcode development, and vulnerability research.

There are several variations and derivatives of this technique. The shellcode could be stored in an environment variable, and execle could be used along with the overflowed buffer resulting in the return address pointing to an appropriate position in the environment.

Heap

If a buffer residing on the heap (such as a malloc'ed variable) is overflowable (for example, it is used in the program without proper bounds checking), it could result in an exploitable vulnerability. If a critical data structure (a filename to be overwritten or created, a structure containing user credentials, or a critical function pointer) is located on the heap after the "weak" buffer, it could be misused.

It may be possible to misuse code that is already present in the address space of the program. You may already have useful strings in the program itself, or in libraries that it uses. Thus, an exploit could reroute control-flow to execute one or more functions from the C library, usually after arranging the appropriate parameters on the stack. If the called C function takes a pointer as an argument, the pointer's target might be corruptible.

Typical malloc implementations use the heap itself to store their book-keeping information (for example, the free block list). Note that this is similar in philosophy to a function's return address on the stack: control information lives alongside data, and can be overwritten. Thus, overflowing a malloc'ed buffer could be used to corrupt the memory management information (since it is kept next to the blocks used by the program).

It is possible to corrupt administrative data structures in certain malloc implementations by freeing memory multiple times. A widely reported example of this vulnerability was in the zlib compression library.

Incorrect Mathematics

Another category of attacks is those that cause integer overflow. Consider a program that uses an integer, arrived at after runtime calculations (based on user-input, say), to allocate some memory. If the user can cause the result of the calculations to be larger than what can be held correctly in the variable being used by the program, subsequent operations using that integer may not work correctly. Note that an underflow could be abused similarly. Consider the following code fragment:

/* itest.c */ #include <stdlib.h> int main(int argc, char **argv) { int n; char c8; /* overflows at 128 */ unsigned char uc8; /* overflows at 256 */ /* ... */ n = atoi(argv[1]); c8 = n; uc8 = n; /* ... */ printf(" signed: %d (%x)\n", c8, c8); printf("unsigned: %u (%x)\n", uc8, uc8); /* ... somepointer = malloc(somefunction(c8)); ... */ }

Now consider:

% ./itest 127 signed: 127 (7f) unsigned: 127 (7f) % ./itest 255 signed: -1 (ffffffff) unsigned: 255 (ff) % ./itest 256 signed: 0 (0) unsigned: 0 (0)

Overflows, along with several size- and type-related subtleties that one must be aware of while using mathematical functions in programs, constitute a set of exploitable entities.

Format String Vulnerabilities

The omnipresent printf function uses the varargs (variable arguments) mechanism to accept any number of arguments. Sometimes, programmers pass untrusted strings to printf. Consider the following two apparently similar invocations:

printf("%s", untrustedstring); printf(untrustedstring); /* unsafe! */

The second invocation is unsafe because untrustedstring could contain printf format specifiers itself. Since there are no other arguments that would correspond to any format specifiers in the string, printf would simply try to use whatever is on the stack as arguments — it doesn't know any better.

For example, if you were to use a "%p" or "%x" specifier, printf would expect an argument, which isn't there in our function invocation. As far as printf is concerned, it would simply print the element from the stack where the argument normally would have been. In this fashion, you could examine stack memory.

Along similar lines, you could use the "%s" format specifier to read data from (reasonably) arbitrary memory addresses. The address to be read could be supplied using the format string itself.

It is usually possible (depending upon your printf implementation) to access a specific paramter, upto a certain number (again, implementation-dependent) on the stack directly from within a format string. Consider:

% cat dpa.c main() { printf("%4$s\n", "five", "four", "three", "two", "one"); } % gcc -o dpa dpa.c % ./dpa four

This direct parameter access could be a simplifying factor in developing such attacks.

Another specifier, "%n", could be used to write arbitrary data to carefully-selected addresses. When "%n" is used, the number of characters written so far is stored into the integer indicated by the int * (or variant) pointer argument, without any argument conversion.

Besides printf, there are several other related functions subject to the same vulnerabilities, such as: asprintf, fprintf, sprintf, snprintf, etc.

Other, non-printf family functions with similar issues include syslog, and the err/warn family of functions for displaying formatted error and warning messages.

Many Others

There are several other areas of interest (susceptible to format string, or other attacks) depending on the platform, the binary format involved, etc. For example:

% cat d.c void constructor(void) __attribute__ ((constructor)); void destructor1(void) __attribute__ ((destructor)); void destructor2(void) __attribute__ ((destructor)); void constructor(void) { printf("constructor()\n"); } void destructor1(void) { printf("destructor1()\n"); } void destructor2(void) { printf("destructor2()\n"); } main() { printf("main()\n"); } % gcc -o d d.c % ./d constructor() main() destructor1() destructor2()

Thus, constructors and destructors are automatically called just before main starts executing, and just before it exits, respectively. These functions live in their own sections: .ctors and .dtors, respectively, in 32-bit ELF (let us assume ELF on 32-bit Linux). In a Mach-O file (such as on Mac OS X), the corresponding sections are called LC_SEGMENT.__DATA.__mod_init_func and LC_SEGMENT.__DATA.__mod_term_func, respectively.

Now, the reasons that make this mechanism susceptible to attack are the following: these sections in an ELF file are writable, and moreover, even in the absence of any explicitly declared destructors, GCC puts empty .dtors and .ctors sections in an ELF file. Note that this is not the case for Mach-O binaries compiled with GCC.

LD_PRELOAD

The run-time link-editor on a typical Unix system honors several environment variables, one of which is LD_PRELOAD. This variable can contain a colon separated list of shared libraries, to be linked in before any other shared libraries. This feature is useful in many scenarios:

Now, while LD_PRELOAD is not honored for setuid programs (if it were, any user could run a setuid program, with a common function re-implemented in the preloaded library to exec() a shell), it could be a mechanism for mischief. For example, a virus could pollute a user's environment namespace (or maybe even the global namespace), to have a viral library preloaded. LD_PRELOAD was also used to exploit some network daemons that allowed their clients to transfer environment variables. In certain cases, a user could upload a malicious shared library to the machine on which the daemon was running, and thus could obtain the privileges of the daemon.

Defeating a restricted shell using LD_PRELOAD

Consider the example of a restricted shell (such as /usr/lib/rsh or /usr/bin/rksh on Solaris). The user of such a shell is subject to one or more restrictions such as:

Depending on the platform, and the naïvte or oversight of the administrators, it might be possible (but usually is not) to break out of such a shell trivially, say, by launching an unrestricted shell from within a text editor.

Another way out might be possible using LD_PRELOAD (I say "might" as I have not tested this in many years). Curiously, LD_PRELOAD is not a restricted variable in a restricted shell. If any dynamically linked program is in the path, write a few lines of code to replace execve(), create a shared library, and place it in the restricted account. The new code modifies/re-creates the environment pointer of the execve'd program (SHELL=/bin/sh, to begin with). Thereafter, it is possible to undo the restriction.

<<< Platform-Independent Malware main Securing Memory >>>