CWE-195

Signed to Unsigned Conversion Error. [Improper-Control-Of-A-Resource-Through-Its-Lifetime]

Required inputs: IR, StaticSemanticAnalysis

The product uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive.

It is dangerous to rely on implicit casts between signed and unsigned numbers because the result can take on an unexpected value and violate assumptions made by the program.

Often, functions will return negative values to indicate a failure. When the result of a function is to be used as a size parameter, using these negative return values can have unexpected results. For example, if negative size values are passed to the standard memory copy or allocation functions they will be implicitly cast to a large unsigned value. This may lead to an exploitable buffer overflow or underflow condition.

Demonstrative Examples
Example 1

In this example the variable amount can hold a negative value when it is returned. Because the function is declared to return an unsigned int, amount will be implicitly converted to unsigned.

Example Language:C
    unsigned int readdata () {
        int amount = 0;
        ...
        if (result == ERROR)
        amount = -1;
        ...
        return amount;
    }

If the error condition in the code above is met, then the return value of readdata() will be 4,294,967,295 on a system that uses 32-bit integers.

Example 2

In this example, depending on the return value of accecssmainframe(), the variable amount can hold a negative value when it is returned. Because the function is declared to return an unsigned value, amount will be implicitly cast to an unsigned number.

Example Language:C
    unsigned int readdata () {
        int amount = 0;
        ...
        amount = accessmainframe();
        ...
        return amount;
    }

If the return value of accessmainframe() is -1, then the return value of readdata() will be 4,294,967,295 on a system that uses 32-bit integers.

Example 3

The following code is intended to read an incoming packet from a socket and extract one or more headers.

Example Language:C
    DataPacket *packet;
    int numHeaders;
    PacketHeader *headers;

    sock=AcceptSocketConnection();
    ReadPacket(packet, sock);
    numHeaders =packet->headers;

    if (numHeaders > 100) {
        ExitError("too many headers!");
    }
    headers = malloc(numHeaders * sizeof(PacketHeader);
    ParsePacketHeaders(packet, headers);

The code performs a check to make sure that the packet does not contain too many headers. However, numHeaders is defined as a signed int, so it could be negative. If the incoming packet specifies a value such as -3, then the malloc calculation will generate a negative number (say, -300 if each header can be a maximum of 100 bytes). When this result is provided to malloc(), it is first converted to a size_t type. This conversion then produces a large value such as 4294966996, which may cause malloc() to fail or to allocate an extremely large amount of memory (CWE-195). With the appropriate negative numbers, an attacker could trick malloc() into using a very small positive number, which then allocates a buffer that is much smaller than expected, potentially leading to a buffer overflow.

Example 4

This example processes user input comprised of a series of variable-length structures. The first 2 bytes of input dictate the size of the structure to be processed.

Example Language:C
    char* processNext(char* strm) {
        char buf[512];
        short len = *(short*) strm;
        strm += sizeof(len);
        if (len <= 512) {
            memcpy(buf, strm, len);
            process(buf);
            return strm + len;
        }
        else {
            return -1;
        }
    }

The programmer has set an upper bound on the structure size: if it is larger than 512, the input will not be processed. The problem is that len is a signed short, so the check against the maximum structure length is done with signed values, but len is converted to an unsigned integer for the call to memcpy() and the negative bit will be extended to result in a huge value for the unsigned integer. If len is negative, then it will appear that the structure has an appropriate size (the if branch will be taken), but the amount of memory copied by memcpy() will be quite large, and the attacker will be able to overflow the stack with data in strm.

Example 5

In the following example, it is possible to request that memcpy move a much larger segment of memory than assumed:

Example Language:C
    int returnChunkSize(void *) {
        /* if chunk info is valid, return the size of usable memory,

        * else, return -1 to indicate an error

        */
        ...
    }
    int main() {
        ...
        memcpy(destBuf, srcBuf, (returnChunkSize(destBuf)-1));
        ...
    }

If returnChunkSize() happens to encounter an error it will return -1. Notice that the return value is not checked before the memcpy operation (CWE-252), so -1 can be passed as the size argument to memcpy() (CWE-805). Because memcpy() assumes that the value is unsigned, it will be interpreted as MAXINT-1 (CWE-195), and therefore will copy far more memory than is likely available to the destination buffer (CWE-787, CWE-788).

Example 6

This example shows a typical attempt to parse a string with an error resulting from a difference in assumptions between the caller to a function and the function's action.

Example Language:Cint proc_msg(char *s, int msg_len) (Unsupported language for documentation only)
    {

    // Note space at the end of the string - assume all strings have preamble with space
    int pre_len = sizeof("preamble: ");
    char buf[pre_len - msg_len];
    ... Do processing here if we get this far
    }
    char *s = "preamble: message\n";
    char *sl = strchr(s, ':');        // Number of characters up to ':' (not including space)
    int jnklen = sl == NULL ? 0 : sl - s;    // If undefined pointer, use zero length
    int ret_val = proc_msg ("s",  jnklen);    // Violate assumption of preamble length, end up with negative value, blow out stack

The buffer length ends up being -1, resulting in a blown out stack. The space character after the colon is included in the function calculation, but not in the caller's calculation. This, unfortunately, is not usually so obvious but exists in an obtuse series of calculations.

Excerpts from CWE [https://cwe.mitre.org], Copyright (C) 2006-2026, the MITRE Corporation. See section 9.4. "3rd-Party Licenses" in the documentation for full details.

Possible Messages

Key

Text

Severity

Disabled

cast_truncate

Conversion from signed to unsigned type.

None

False

cast_underflow

Conversion from signed to unsigned type.

None

False

certain_shift_amount_negative

Shift by a negative bit count (undefined behavior)

None

False

certain_shift_amount_too_large

Shift by the integer width or more (undefined behavior)

None

False

certain_shift_right_negative

Right shift with negative left-hand-side

None

False

unsigned_cast_underflow

Conversion from signed to unsigned type.

None

False

Options

abstract_interpretation_maximal_tracked_array_index

abstract_interpretation_maximal_tracked_array_index : int = 10

The number of explicit indices in array expressions per routine tracked by the "symbolic expression analysis". For example, consider the following program.

extern signed char a[6];
int main()
{
    if (a[2] < 0)
    {
        a[2]++;
    }
    if (a[3] < 0)
    {
        a[3]++;
    }
    if (a[4] < 0)
    {
        a[4]++;
    }
    return 0;
}

If the value of this option is set to 2, the first two array index expressions encountered in the routine are tracked. Hence, the analysis can use the facts a[2] < 0 and a[3] < 0 to infer that a[2]++ and a[3]++ do not overflow, but it will not track the third array access in this routine.

A higher value of the option can cause more consumption of memory and time for the analysis.

 

abstract_interpretation_overflow

abstract_interpretation_overflow : bool = False

Use abstract-interpretation-based "symbolic expression analysis" as additional postprocessing step.
 

abstract_interpretation_overflow_unrolling_level

abstract_interpretation_overflow_unrolling_level : int = 0

How many levels of conditions are traversed to compute additional constraints for the "symbolic expression analysis".
 

check_signed

check_signed : bool = False

Whether issues for signed integer operations should be reported. For casts including implicit conversions, the target type of the cast is used.
 

check_unsigned

check_unsigned : bool = True

Whether wrap-around for unsigned integer operations should be reported. For casts including implicit conversions, the target type of the cast is used.
 

suppress_well_defined_findings

suppress_well_defined_findings : SuppressionMode = 'NONE'

Some overflows have well-defined semantics in all C/C++ standard versions. The typical example is UINT_MAX+1 which is well-defined as 0 via wraparound. This differs from INT_MAX+1 which is either undefined or implementation-defined depending on the considered standard version. Most CPUs will compute INT_MIN but this wraparound is not guaranteed by any C/C++ standard.

Both cases are overflows and are reported by this rule. However, one might want to suppress messages for the well-defined cases. To suppress these activate this option.

Different C and C++ standard versions differ in what is well-defined, implementation-defined, or undefined. Luckily, if we only consider well-defined and do not discern between implementation-defined and undefined, we end up with only two groups: pre-C++20 and since-C++20.

 

Option Types

These types are used by options listed above:

SuppressionMode

An enumeration.
 

NONE

Suppress nothing.

PRE_CPP2020

Suppress findings that are well-defined before C++20. These are:

  • Over- and underflows of unsigned integers during addition, subtraction, and multiplication
  • Conversions from unsigned to unsigned integers
  • Wrap-around caused by left-shifting of unsigned integer

CPP2020

Suppress findings that are well-defined since C++20. These are:

  • Over- and underflows of unsigned integers during addition, subtraction, and multiplication
  • Conversions between signed and unsigned integers
  • Wrap-around caused by left-shifting
  • Shifting negative integers

Surprising mechanics of C++20 signed narrow integers

Since C++20, casts between signed and unsigned are defined as two-complement wrap-around. Overflows of signed integers are still undefined behavior and are reported by this rule. But, due to integer promotion rules, certain expressions are computed using wider integer types, which can lead to the false impression that this is no longer the case, because no overflow findings are reported there.

Suppose, that the code is compiled on a platform where short is smaller than or equal to half the size of an int. Very commonly the sizes are 2 and 4. This assumption is thus true for many platforms.

In this case, narrow signed integer types such as short or signed char are first implicitly promoted to int before the arithmetic operation is executed. Because of this promotion, the actual operation does not overflow and is thus well-defined. After the operation, an implicit cast is performed to the narrower type. This cast is well-defined in C++20 as wrapping around.

Consider the following snippet:

    static_assert(sizeof(short) == 2);
    static_assert(sizeof(int) == 4);
    short a = 0x1000;
    short b = 0x1001;
    short c = a*b;
C++20 defines c as 0x1000. The reason is that a*b is implicitly promoted to static_cast<int> (a)*static_cast<int>(b). After the promotion, the multiplication does not overflow and yields a well-defined 0x1001000. This number is then implicitly cast to 0x1000 which is also a well-defined operation.

An analogous effect can be observed for signed short addition and multiplication. Another effect is that it is well-defined to shift by up to as many bits as int has even if the shifted integer has fewer bits.

DERIVE_FROM_IR

Derive the language version from the IR compilation flags and suppress findings accordingly.