CWE-195¶
Signed to Unsigned Conversion Error. [Improper-Control-Of-A-Resource-Through-Its-Lifetime]
Required inputs: IR, StaticSemanticAnalysis
It is dangerous to rely on implicit casts between signed and unsigned numbers because the result can take on an unexpected value and violate assumptions made by the program.
Often, functions will return negative values to indicate a failure. When the result of a function is to be used as a size parameter, using these negative return values can have unexpected results. For example, if negative size values are passed to the standard memory copy or allocation functions they will be implicitly cast to a large unsigned value. This may lead to an exploitable buffer overflow or underflow condition.
Demonstrative Examples
Example 1
In this example the variable amount can hold a negative value when it is returned. Because the function is declared to return an unsigned int, amount will be implicitly converted to unsigned.
Example Language:C
unsigned int readdata () {
int amount = 0;
...
if (result == ERROR)
amount = -1;
...
return amount;
}
If the error condition in the code above is met, then the return value of readdata() will be 4,294,967,295 on a system that uses 32-bit integers.
Example 2
In this example, depending on the return value of accecssmainframe(), the variable amount can hold a negative value when it is returned. Because the function is declared to return an unsigned value, amount will be implicitly cast to an unsigned number.
Example Language:C
unsigned int readdata () {
int amount = 0;
...
amount = accessmainframe();
...
return amount;
}
If the return value of accessmainframe() is -1, then the return value of readdata() will be 4,294,967,295 on a system that uses 32-bit integers.
Example 3
The following code is intended to read an incoming packet from a socket and extract one or more headers.
Example Language:C
DataPacket *packet;
int numHeaders;
PacketHeader *headers;
sock=AcceptSocketConnection();
ReadPacket(packet, sock);
numHeaders =packet->headers;
if (numHeaders > 100) {
ExitError("too many headers!");
}
headers = malloc(numHeaders * sizeof(PacketHeader);
ParsePacketHeaders(packet, headers);
The code performs a check to make sure that the packet does not contain too many headers. However, numHeaders is defined as a signed int, so it could be negative. If the incoming packet specifies a value such as -3, then the malloc calculation will generate a negative number (say, -300 if each header can be a maximum of 100 bytes). When this result is provided to malloc(), it is first converted to a size_t type. This conversion then produces a large value such as 4294966996, which may cause malloc() to fail or to allocate an extremely large amount of memory (CWE-195). With the appropriate negative numbers, an attacker could trick malloc() into using a very small positive number, which then allocates a buffer that is much smaller than expected, potentially leading to a buffer overflow.
Example 4
This example processes user input comprised of a series of variable-length structures. The first 2 bytes of input dictate the size of the structure to be processed.
Example Language:C
char* processNext(char* strm) {
char buf[512];
short len = *(short*) strm;
strm += sizeof(len);
if (len <= 512) {
memcpy(buf, strm, len);
process(buf);
return strm + len;
}
else {
return -1;
}
}
The programmer has set an upper bound on the structure size: if it is larger than 512, the input will not be processed. The problem is that len is a signed short, so the check against the maximum structure length is done with signed values, but len is converted to an unsigned integer for the call to memcpy() and the negative bit will be extended to result in a huge value for the unsigned integer. If len is negative, then it will appear that the structure has an appropriate size (the if branch will be taken), but the amount of memory copied by memcpy() will be quite large, and the attacker will be able to overflow the stack with data in strm.
Example 5
In the following example, it is possible to request that memcpy move a much larger segment of memory than assumed:
Example Language:C
int returnChunkSize(void *) {
/* if chunk info is valid, return the size of usable memory,
* else, return -1 to indicate an error
*/
...
}
int main() {
...
memcpy(destBuf, srcBuf, (returnChunkSize(destBuf)-1));
...
}
If returnChunkSize() happens to encounter an error it will return -1. Notice that the return value is not checked before the memcpy operation (CWE-252), so -1 can be passed as the size argument to memcpy() (CWE-805). Because memcpy() assumes that the value is unsigned, it will be interpreted as MAXINT-1 (CWE-195), and therefore will copy far more memory than is likely available to the destination buffer (CWE-787, CWE-788).
Example 6
This example shows a typical attempt to parse a string with an error resulting from a difference in assumptions between the caller to a function and the function's action.
Example Language:Cint proc_msg(char *s, int msg_len) (Unsupported language for documentation only)
{
// Note space at the end of the string - assume all strings have preamble with space
int pre_len = sizeof("preamble: ");
char buf[pre_len - msg_len];
... Do processing here if we get this far
}
char *s = "preamble: message\n";
char *sl = strchr(s, ':'); // Number of characters up to ':' (not including space)
int jnklen = sl == NULL ? 0 : sl - s; // If undefined pointer, use zero length
int ret_val = proc_msg ("s", jnklen); // Violate assumption of preamble length, end up with negative value, blow out stack
The buffer length ends up being -1, resulting in a blown out stack. The space character after the colon is included in the function calculation, but not in the caller's calculation. This, unfortunately, is not usually so obvious but exists in an obtuse series of calculations.
Excerpts from CWE [https://cwe.mitre.org], Copyright (C) 2006-2026, the MITRE Corporation. See section 9.4. "3rd-Party Licenses" in the documentation for full details.Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
cast_truncate |
Conversion from signed to unsigned type. |
None |
False |
cast_underflow |
Conversion from signed to unsigned type. |
None |
False |
certain_shift_amount_negative |
Shift by a negative bit count (undefined behavior) |
None |
False |
certain_shift_amount_too_large |
Shift by the integer width or more (undefined behavior) |
None |
False |
certain_shift_right_negative |
Right shift with negative left-hand-side |
None |
False |
unsigned_cast_underflow |
Conversion from signed to unsigned type. |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
abstract_interpretation_maximal_tracked_array_index¶
abstract_interpretation_maximal_tracked_array_index : int = 10
The number of explicit indices in array expressions per routine tracked by the "symbolic expression analysis". For example, consider the following program.
extern signed char a[6];
int main()
{
if (a[2] < 0)
{
a[2]++;
}
if (a[3] < 0)
{
a[3]++;
}
if (a[4] < 0)
{
a[4]++;
}
return 0;
}
If the value of this option is set to 2, the first two array index expressions
encountered in the routine are tracked. Hence, the analysis can use the facts
a[2] < 0 and a[3] < 0 to infer that a[2]++
and a[3]++ do not overflow, but it will not track the third array
access in this routine.
A higher value of the option can cause more consumption of memory and time for the analysis.
abstract_interpretation_overflow¶
abstract_interpretation_overflow : bool = False
abstract_interpretation_overflow_unrolling_level¶
abstract_interpretation_overflow_unrolling_level : int = 0
check_signed¶
check_signed : bool = False
check_unsigned¶
check_unsigned : bool = True
suppress_well_defined_findings¶
suppress_well_defined_findings : SuppressionMode = 'NONE'
Some overflows have well-defined semantics in all C/C++ standard
versions. The typical example is UINT_MAX+1 which is
well-defined as 0 via wraparound. This differs from
INT_MAX+1 which is either undefined or implementation-defined
depending on the considered standard version. Most CPUs will compute
INT_MIN but this wraparound is not guaranteed by any C/C++
standard.
Both cases are overflows and are reported by this rule. However, one might want to suppress messages for the well-defined cases. To suppress these activate this option.
Different C and C++ standard versions differ in what is well-defined, implementation-defined, or undefined. Luckily, if we only consider well-defined and do not discern between implementation-defined and undefined, we end up with only two groups: pre-C++20 and since-C++20.
Option Types¶
These types are used by options listed above:
SuppressionMode¶
An enumeration.NONE
Suppress nothing.
PRE_CPP2020
Suppress findings that are well-defined before C++20. These are:
- Over- and underflows of unsigned integers during addition, subtraction, and multiplication
- Conversions from unsigned to unsigned integers
- Wrap-around caused by left-shifting of unsigned integer
CPP2020
Suppress findings that are well-defined since C++20. These are:
- Over- and underflows of unsigned integers during addition, subtraction, and multiplication
- Conversions between signed and unsigned integers
- Wrap-around caused by left-shifting
- Shifting negative integers
Surprising mechanics of C++20 signed narrow integers
Since C++20, casts between signed and unsigned are defined as two-complement wrap-around. Overflows of signed integers are still undefined behavior and are reported by this rule. But, due to integer promotion rules, certain expressions are computed using wider integer types, which can lead to the false impression that this is no longer the case, because no overflow findings are reported there.
Suppose, that the code is compiled on a platform where short
is smaller than or equal to half the size of an int. Very
commonly the sizes are 2 and 4. This assumption is thus true for many
platforms.
In this case, narrow signed integer types such as short or
signed char are first implicitly promoted to int
before the arithmetic operation is executed. Because of this promotion, the
actual operation does not overflow and is thus well-defined. After the
operation, an implicit cast is performed to the narrower type. This cast is
well-defined in C++20 as wrapping around.
Consider the following snippet:
static_assert(sizeof(short) == 2);
static_assert(sizeof(int) == 4);
short a = 0x1000;
short b = 0x1001;
short c = a*b;
C++20 defines c as 0x1000. The reason is that
a*b is implicitly promoted to static_cast<int>
(a)*static_cast<int>(b). After the promotion, the
multiplication does not overflow and yields a well-defined
0x1001000. This number is then implicitly cast to
0x1000 which is also a well-defined operation.
An analogous effect can be observed for signed short addition and
multiplication. Another effect is that it is well-defined to shift by up to
as many bits as int has even if the shifted integer has fewer
bits.
DERIVE_FROM_IR
Derive the language version from the IR compilation flags and suppress findings accordingly.