Software security requires increased diligence

28th July 2014

Nat Bowers

0 0

Analysing software can be challenging but it’s becoming even more critical, particularly as security plays an ever-increasingly important role in embedded devices.

As more functionality in embedded systems is being provided by software, the risk of programming defects that introduce security vulnerabilities is increasing. The electronic systems in automobiles are at particular risk because cars are an especially juicy target for attackers. Recently, researchers demonstrated that it was relatively easy to find software security vulnerabilities in a late-model car, and were able to exploit them to remotely unlock the doors and even start the engine [1]. Embedded systems have not been a common target of hackers until recently, so there is low awareness of the risk of insecure code among developers. Attackers are getting more sophisticated and are targeting more embedded systems, so it is important that developers are educated about the risks.

Systems that are comprised of code supplied by several different vendors are at particular risk. Research has shown that defects that give rise to security vulnerabilities proliferate at the boundaries between modules. Programmers can defend against such defects by treating inputs from potentially risky channels as hazardous until the validity of the data has been checked. In the parlance of secure programming, unchecked input values are said to be tainted. It can be difficult to check that a program handles tainted data properly because doing so involves tracking its flow through the structure of the code. This can be tedious, even for relatively small programs, and is generally unfeasible to do manually for most real-world applications.

The biggest risk of using values read from a risky channel is that an attacker can use the channel to trigger a security vulnerability or to cause the program to crash. The different kinds of defects that can be triggered by using tainted data include scripting, arithmetic overflow and path traversal. Many of the most damaging cyber-attacks in the last two decades have been caused by the infamous buffer overrun defect. As this is such a pervasive vulnerability and because it illustrates the importance of taint analysis, it is worth explaining in some detail.

There are several ways in which a buffer overrun can be exploited by an attacker, but here we describe the classic case that makes it possible for the attacker to hijack the process and force it to run arbitrary code. In this case, the buffer is on the stack. Consider the following code:

Software security requires increased diligence - code

In this example, the input from the outside world is through a call to getenv that retrieves the value of the environment variable named “CONFIG”. The programmer who wrote this code was expecting the value of the environment variable to fit in the buffer, but there is nothing that checks that this is so. If the attacker has control over the value of that environment variable, then assigning a value whose length exceeds 100 will cause a buffer overrun to occur. Because buf is an automatic variable, which will be placed on the stack as part of the activation record for the procedure, any characters after the first 100 will be written to the parts of the program stack beyond the boundaries of buf.

The variable named count may be overwritten (depending on how the compiler chose to allocate space on the stack). If so, then the value of that variable is under the control of the attacker. This is bad enough, but the real prize for the attacker is that the stack contains the address to which the program will jump once it has finished executing the procedure. To exploit this vulnerability, the attacker can set the value of the variable to a specially-crafted string that encodes a return address of his choosing. When the CPU gets to the end of the function, it will return to that address instead of the address of the function’s caller.

This example is taking its input from the environment, but the code would be just as risky if the string was being read from another input source, such as the file system or a network channel. The most risky input channels are those over which an attacker has control.

Taint sources, sinks and cleansers

In the terminology of taint analysis a taint source is the location in the program where data is being read from a risky source. In the above example, it is the call to getenv(). A taint sink is a location to which tainted data should not flow unless it has been checked for validity, such as the call to strcpy() in the above example. Once a value has been checked it is said to have been cleansed of the taint.

As mentioned above, most programs take input from many sources, and the environment in which the program will execute determines the level of risk associated with each source. A classification of taint sources might be the following:

The environment variables, as exemplified above.
File contents.
File metadata such as a file’s permissions or datestamps.
The network.
Network services, such as the results of a DNS query.
The system clock.
The registry as is found on Windows systems.

Of course a program may have other kinds of inputs that could be potentially hazardous. A program that reads input from a device with an infra-red sensor should probably consider that channel as dangerous. Security analysts talk about a program’s attack surface as being the points of exposure to a potentially hostile attacker. To assess a program’s risk it is useful to first know what its attack surface is. This corresponds closely to the program’s taint sources. Finding program errors that are sensitive to tainted values can be very challenging, so automation is the best approach.

Figure 1 - An example warning report for a buffer overrun

Figure 1 - An example warning report for a buffer overrun

Taint analysis is a form of static analysis; Figure 1 shows an example warning report from the CodeSonar static analysis for a buffer overrun. The report shows the path through the code that must be taken for the bug to trigger. Interesting points along the way are highlighted. An explanation of what can go wrong is given at the point which the overrun happens.

It can be difficult to track the flow of tainted data through a program because doing so involves tracking the value as it is copied from variable to variable, possibly across procedure boundaries and through several layers of indirection. Consider for example a program that reads a string from a risky network port. As strings in C are by convention managed through pointers, the analysis must track both the contents of the string and the value of all pointers that might refer to the string. The characters themselves are said to be tainted whereas the pointer is said to ‘point to taintedness’. If the contents of the string are copied, e.g., by using strcpy(), then the taintedness property will be transferred to the new string. If the pointers are copied, then the points-to-taint property must be transferred to the new pointer.

Understanding taint flow

Taint can flow in unexpected ways through a program, so it is important to help programmers understand these channels. The location of taint sources and sinks can be visualised, and program elements involved in flows can be overlaid on top of a regular code view. This can help developers understand the risk of their code and aid them in deciding how best to change the code to shut down the vulnerability.

Figure 2 - Report of a buffer overrun vulnerability

Figure 2 - Report of a buffer overrun vulnerability

Figure 2 shows a report of another buffer overrun vulnerability. In this example, first note the blue underlining on line 80. This indicates that the value of the variable pointed to by the parameter passed into the procedure is tainted by the file system. Although this may help a user understand the code, the most interesting parts of this warning are on lines 91 and 92. The underlining on line 91 indicates that the value returned by compute_pkgdatadir() is a pointer to some data that is tainted by the environment. The call to strcpy() then copies that data into the local buffer named full_file_name (declared on line 84). This of course transfers the taintedness property into that buffer. Consequently on line 92 the red underlining shows that the buffer has become tainted by a value from the environment.

The explanation for the buffer overrun confirms that the value returned by compute_pkgdatadir() is in fact a value retrieved from a call to getenv(). A user inspecting this code can thus see that there is a risk of a security vulnerability if an attacker can control the value of the environment variable. An alternative way of viewing the flow of taint through a program is a top-down view. An example is shown in Figure 3. In this example, the user has made use of the red colouration to identify a module containing taint sources. This is a reasonable approximation to the attack surface of the program. The code within that module is shown in the pane to the right; the underlining shows the variables that carry taint.

Figure 3 - An alternative way of viewing the taint flow through a program

Figure 3 - An alternative way of viewing the taint flow through a program

Software that expects that its inputs are well formed and within reasonable ranges is inherently risky and prone to failure. In the worst case bad data can lead to serious security vulnerabilities and crashes. Taint analysis is a technique that helps programmers understand how risky data can flow from one part of the program to another. Advanced static-analysis tools that can provide taint analysis and present the results to the user can make the task of understanding a program’s attack surface easier and ease the work involved in finding and fixing serious defects.

[1] Experimental Security Analysis of a Modern Automobile. Karl Koscher, Alexei Czeskis, Franziska Roesner, Shwetak Patel, Tadayoshi Kohno, Stephen Checkoway, Damon McCoy, Brian Kantor, Danny Anderson, Hovav Shacham, Stefan Savage. IEEE Symposium on Security and Privacy, Oakland, CA, May 16–19, 2010.