CertC++-INT31¶
Ensure that integer conversions do not result in lost or misinterpreted data
Required inputs: IR, StaticSemanticAnalysis
Integer conversions, both implicit and explicit (using a cast), must be guaranteed not to result in lost or misinterpreted data. This rule is particularly true for integer values that originate from untrusted sources and are used in any of the following ways:
- Integer operands of any pointer arithmetic, including array indexing
- The assignment expression for the declaration of a variable length array
- The postfix expression preceding square brackets
[]or the expression in square brackets[]of a subscripted designation of an element of an array object - Function arguments of type
size_torrsize_t(for example, an argument to a memory allocation function)
This rule also applies to arguments passed to the following library
functions that are converted to
unsigned char:
memset()memset_s()fprintf()and related functions (For the length modifierc, if nollength modifier is present, theintargument is converted to anunsigned char, and the resulting character is written.)fputc()ungetc()memchr()
and to arguments to the following library functions that are converted to
char:
strchr()strrchr()- All of the functions listed in
<ctype.h>
The only integer type conversions that are guaranteed to be safe for all data values and all possible conforming implementations are conversions of an integral value to a wider type of the same signedness. The C Standard, subclause 6.3.1.3 [ ISO/IEC 9899:2011], says
When a value with integer type is converted to another integer type other than
_Bool, if the value can be represented by the new type, it is unchanged.Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Typically, converting an integer to a smaller type results in truncation of the high-order bits.
Noncompliant Code Example (Unsigned to Signed)
Type range errors, including loss of data (truncation) and loss of sign (sign errors), can occur when converting from a value of an unsigned integer type to a value of a signed integer type. This noncompliant code example results in a truncation error on most implementations:
#include <limits.h>
void func(void) {
unsigned long int u_a = ULONG_MAX;
signed char sc;
sc = (signed char)u_a; /* Cast eliminates warning */
/* ... */
}
Compliant Solution (Unsigned to Signed)
Validate ranges when converting from an unsigned type to a signed type. This
compliant solution can be used to convert a value of
unsigned long int type to a value of
signed char type:
#include <limits.h>
void func(void) {
unsigned long int u_a = ULONG_MAX;
signed char sc;
if (u_a <= SCHAR_MAX) {
sc = (signed char)u_a; /* Cast eliminates warning */
} else {
/* Handle error */
}
}
Noncompliant Code Example (Signed to Unsigned)
Type range errors, including loss of data (truncation) and loss of sign (sign errors), can occur when converting from a value of a signed type to a value of an unsigned type. This noncompliant code example results in a negative number being misinterpreted as a large positive number.
#include <limits.h>
void func(signed int si) {
/* Cast eliminates warning */
unsigned int ui = (unsigned int)si;
/* ... */
}
/* ... */
func(INT_MIN);
Compliant Solution (Signed to Unsigned)
Validate ranges when converting from a signed type to an unsigned type. This
compliant solution converts a value of a
signed int type to a value of an
unsigned int type:
#include <limits.h>
void func(signed int si) {
unsigned int ui;
if (si < 0) {
/* Handle error */
} else {
ui = (unsigned int)si; /* Cast eliminates warning */
}
/* ... */
}
/* ... */
func(INT_MIN + 1);
Subclause 6.2.5, paragraph 9, of the C Standard [ ISO/IEC 9899:2011] provides the necessary guarantees to ensure this solution works on a conforming implementation:
The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same.
Noncompliant Code Example (Signed, Loss of Precision)
A loss of data (truncation) can occur when converting from a value of a signed integer type to a value of a signed type with less precision. This noncompliant code example results in a truncation error on most implementations:
#include <limits.h>
void func(void) {
signed long int s_a = LONG_MAX;
signed char sc = (signed char)s_a; /* Cast eliminates warning */
/* ... */
}
Compliant Solution (Signed, Loss of Precision)
Validate ranges when converting from a signed type to a signed type with less
precision. This compliant solution converts a value of a
signed long int type to a value of a
signed char type:
#include <limits.h>
void func(void) {
signed long int s_a = LONG_MAX;
signed char sc;
if ((s_a < SCHAR_MIN) || (s_a > SCHAR_MAX)) {
/* Handle error */
} else {
sc = (signed char)s_a; /* Use cast to eliminate warning */
}
/* ... */
}
Conversions from a value of a signed integer type to a value of a signed integer type with less precision requires that both the upper and lower bounds are checked.
Noncompliant Code Example (Unsigned, Loss of Precision)
A loss of data (truncation) can occur when converting from a value of an unsigned integer type to a value of an unsigned type with less precision. This noncompliant code example results in a truncation error on most implementations:
#include <limits.h>
void func(void) {
unsigned long int u_a = ULONG_MAX;
unsigned char uc = (unsigned char)u_a; /* Cast eliminates warning */
/* ... */
}
Compliant Solution (Unsigned, Loss of Precision)
Validate ranges when converting a value of an unsigned integer type to a value
of an unsigned integer type with less precision. This compliant solution
converts a value of an
unsigned long int type to a value of an
unsigned char type:
#include <limits.h>
void func(void) {
unsigned long int u_a = ULONG_MAX;
unsigned char uc;
if (u_a > UCHAR_MAX) {
/* Handle error */
} else {
uc = (unsigned char)u_a; /* Cast eliminates warning */
}
/* ... */
}
Conversions from unsigned types with greater precision to unsigned types with less precision require only the upper bounds to be checked.
Noncompliant Code Example (
time_t Return Value)
The
time() function returns the value
(time_t)(-1) to indicate that the calendar time is not
available. The C Standard requires that the
time_t type is only a real type capable of representing
time. (The integer and real floating types are collectively called real types.)
It is left to the implementor to decide the best real type to use to represent
time. If
time_t is implemented as an unsigned integer type with less
precision than a signed
int, the return value of
time() will never compare equal to the integer literal
-1.
#include <time.h>
void func(void) {
time_t now = time(NULL);
if (now != -1) {
/* Continue processing */
}
}
Compliant Solution (
time_t Return Value)
To ensure the comparison is properly performed, the return value of
time() should be compared against
-1 cast to type
time_t:
#include <time.h>
void func(void) {
time_t now = time(NULL);
if (now != (time_t)-1) {
/* Continue processing */
}
}
This solution is in accordance with
INT18-C.
Evaluate integer expressions in a larger size before comparing or assigning to
that size. Note that
(time_+t)-1 also complies with INT31-C-EX3.
Noncompliant Code Example (
memset())
For historical reasons, certain C Standard functions accept an argument of type
int and convert it to either
unsigned char or plain
char. This conversion can result in unexpected behavior if the
value cannot be represented in the smaller type. The second argument to
memset() is an example; it indicates what byte to store in the
range of memory indicated by the first and third arguments. If the second
argument is outside the range of a
signed char or plain
char, then its higher order bits will typically be truncated.
Consequently, this noncompliant solution unexpectedly sets all elements in the
array to 0, rather than 4096:
#include <string.h>
#include <stddef.h>
int *init_memory(int *array, size_t n) {
return memset(array, 4096, n);
}
Compliant Solution (memset())
In general, the
memset() function should not be used to initialize an integer
array unless it is to set or clear all the bits, as in this compliant solution:
#include <string.h>
#include <stddef.h>
int *init_memory(int *array, size_t n) {
return memset(array, 0, n);
}
Exceptions
INT31-C-EX1: The C Standard defines minimum ranges
for standard integer types. For example, the minimum range for an object of
type
unsigned short int is 0 to 65,535, whereas the minimum range for
int is -32,767 to +32,767. Consequently, it is not always possible
to represent all possible values of an
unsigned short int as an
int. However, on the IA-32 architecture, for example, the actual
integer range is from -2,147,483,648 to +2,147,483,647, meaning that it is
quite possible to represent all the values of an
unsigned short int as an
int for this architecture. As a result, it is not necessary to
provide a test for this conversion on IA-32. It is not possible to make
assumptions about conversions without knowing the precision of the underlying
types. If these tests are not provided, assumptions concerning precision must
be clearly documented, as the resulting code cannot be safely ported to a
system where these assumptions are invalid. A good way to document these
assumptions is to use static assertions. (See
DCL03-C.
Use a static assertion to test the value of a constant expression.)
INT31-C-EX2: Conversion from any integer type with a value
between
SCHAR_MIN and
UCHAR_MAX to a character type is permitted provided the value
represents a character and not an integer.
Conversions to unsigned character types are well defined by C to have modular
behavior. A character's value is not misinterpreted by the loss of sign or
conversion to a negative number. For example, the Euro symbol
€ is sometimes represented by bit pattern
0x80 which can have the numerical value 128 or -127 depending on
the signedness of the type.
Conversions to signed character types are more problematic. The C Standard, subclause 6.3.1.3, paragraph 3 [ ISO/IEC 9899:2011], says, regarding conversions
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Furthermore, subclause 6.2.6.2, paragraph 2, says, regarding integer modifications
If the sign bit is one, the value shall be modified in one of the following ways:
- the corresponding value with sign bit 0 is negated (sign and magnitude)
- the sign bit has the value -(2M ) (two's complement);
- the sign bit has the value -(2M - 1) (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. [See note.]
NOTE: Two's complement is shorthand for "radix complement in radix 2." Ones' complement is shorthand for "diminished radix complement in radix 2."
Consequently, the standard allows for this code to trap:
int i = 128; /* 1000 0000 in binary */ assert(SCHAR_MAX == 127); signed char c = i; /* can trap */
However, platforms where this code traps or produces an unexpected value are rare. According to The New C Standard: An Economic and Cultural Commentary by Derek Jones [ Jones 2008],
Implementations with such trap representations are thought to have existed in the past. Your author was unable to locate any documents describing such processors.
INT31-C-EX3: ISO C, section 7.27.2.4, paragraph 3 says:
The time function returns the implementation's best approximation to the current calendar time.
The value (time_t) (-1) is returned if the calendar time is not available.
If
time_t is an unsigned type, then the expression
((time_t) (-1)) is guaranteed to yield a large positive value.
Therefore, conversion of a negative compile-time constant to an unsigned value with the same or larger width is permitted by this rule. This exception does not apply to conversion of unsigned to signed values, nor does it apply if the resulting value would undergo truncation.
Risk Assessment
Integer truncation errors can lead to buffer overflows and the execution of arbitrary code by an attacker.
| Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
|---|---|---|---|---|---|
| INT31-C | High | Probable | High | P6 | L2 |
Related Guidelines
| Taxonomy | Taxonomy item | Relationship |
|---|---|---|
| CERT C | DCL03-C. Use a static assertion to test the value of a constant expression | Prior to 2018-01-12: CERT: Unspecified Relationship |
| CERT C | INT18-C. Evaluate integer expressions in a larger size before comparing or assigning to that size | Prior to 2018-01-12: CERT: Unspecified Relationship |
| CERT C | FIO34-C. Distinguish between characters read from a file and EOF or WEOF | Prior to 2018-01-12: CERT: Unspecified Relationship |
| CERT Oracle Secure Coding Standard for Java | NUM12-J. Ensure conversions of numeric types to narrower types do not result in lost or misinterpreted data | Prior to 2018-01-12: CERT: Unspecified Relationship |
| ISO/IEC TR 24772:2013 | Numeric Conversion Errors [FLC] | Prior to 2018-01-12: CERT: Unspecified Relationship |
| MISRA C:2012 | Rule 10.1 (required) | Prior to 2018-01-12: CERT: Unspecified Relationship |
| MISRA C:2012 | Rule 10.3 (required) | Prior to 2018-01-12: CERT: Unspecified Relationship |
| MISRA C:2012 | Rule 10.4 (required) | Prior to 2018-01-12: CERT: Unspecified Relationship |
| MISRA C:2012 | Rule 10.6 (required) | Prior to 2018-01-12: CERT: Unspecified Relationship |
| MISRA C:2012 | Rule 10.7 (required) | Prior to 2018-01-12: CERT: Unspecified Relationship |
| CWE 2.11 | CWE-192, Integer Coercion Error | 2017-07-17: CERT: Exact |
| CWE 2.11 | CWE-197, Numeric Truncation Error | 2017-06-14: CERT: Rule subset of CWE |
| CWE 2.11 | CWE-681, Incorrect Conversion between Numeric Types | 2017-07-17: CERT: Rule subset of CWE |
| CWE 2.11 | CWE-704 | 2017-07-17: CERT: Rule subset of CWE |
Bibliography
| [ Dowd 2006] | Chapter 6, "C Language Issues" ("Type Conversions," pp. 223-270) |
| [ ISO/IEC 9899:2011] | 6.3.1.3, "Signed and Unsigned Integers" |
| [ Jones 2008] | Section 6.2.6.2, "Integer Types" |
| [ Seacord 2013b] | Chapter 5, "Integer Security" |
| [ Viega 2005] | Section 5.2.9, "Truncation Error" Section 5.2.10, "Sign Extension Error" Section 5.2.11, "Signed to Unsigned Conversion Error" Section 5.2.12, "Unsigned to Signed Conversion Error" |
| [ Warren 2002] | Chapter 2, "Basics" |
| [ xorl 2009] | "CVE-2009-1376: Pidgin MSN SLP Integer Truncation" |
Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
cafe_message |
{} |
None |
False |
cast_overflow |
Cast on result of arithmetic computation may cause overflow |
None |
False |
cast_truncate |
Cast may truncate value |
None |
False |
cast_underflow |
Cast on result of arithmetic computation may cause overflow |
None |
False |
certain_shift_amount_negative |
Shift by a negative bit count (undefined behavior) |
None |
False |
certain_shift_amount_too_large |
Shift by the integer width or more (undefined behavior) |
None |
False |
certain_shift_right_negative |
Right shift with negative left-hand-side |
None |
False |
incompatible_arg |
Argument {} of call to ‘{}’ should fit into integer range {} to {}. |
None |
False |
static_cast_overflow |
Cast on result of arithmetic computation may cause overflow |
None |
False |
static_cast_underflow |
Cast on result of arithmetic computation may cause underflow |
None |
False |
static_cast_underflow_minus_1 |
Casting -1 to an unsigned type causes underflow |
None |
False |
static_overflow |
Arithmetic computation may cause overflow |
None |
False |
static_underflow |
Arithmetic computation may cause underflow |
None |
False |
unsigned_cast_overflow |
Cast on result of arithmetic computation may cause wrap-around |
None |
False |
unsigned_cast_underflow |
Cast on result of arithmetic computation may cause wrap-around |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
abstract_interpretation_maximal_tracked_array_index¶
abstract_interpretation_maximal_tracked_array_index : int = 10
The number of explicit indices in array expressions per routine tracked by the "symbolic expression analysis". For example, consider the following program.
extern signed char a[6];
int main()
{
if (a[2] < 0)
{
a[2]++;
}
if (a[3] < 0)
{
a[3]++;
}
if (a[4] < 0)
{
a[4]++;
}
return 0;
}
If the value of this option is set to 2, the first two array index expressions
encountered in the routine are tracked. Hence, the analysis can use the facts
a[2] < 0 and a[3] < 0 to infer that a[2]++
and a[3]++ do not overflow, but it will not track the third array
access in this routine.
A higher value of the option can cause more consumption of memory and time for the analysis.
abstract_interpretation_overflow¶
abstract_interpretation_overflow : bool = False
abstract_interpretation_overflow_unrolling_level¶
abstract_interpretation_overflow_unrolling_level : int = 0
check_signed¶
check_signed : bool = True
check_unsigned¶
check_unsigned : bool = True
message_predicate¶
message_predicate : typing.Callable[[Cafe_Message], bool] | None = None
True for messages to
report.
relevant_expressions¶
relevant_expressions
Which (const / constant) expressions should be considered.Type: RelevantExpressions
Default:
'const_expressions_only'
Note: this is only relevant for the purely static parts of the analysis. The StaticSemanticAnalysis-based checks for runtime errors will be performed independently.
reported_messages¶
reported_messages : set[int] | None = {514}
reported_severities¶
reported_severities : set[str] = {'error', 'remark', 'warning'}
routines_and_limits¶
routines_and_limits
Names of routines relevant to this check, together with the (zero based) parameter index and the expected value range (min, max). Can use None if upper or lower limit is not needed.Type: dict[bauhaus.analysis.config.QualifiedName, dict[int, typing.Tuple[int, int]]]
Default:
{ 'memset': { 1: (0, 255) } }
suppress_well_defined_findings¶
suppress_well_defined_findings
Type: SuppressionMode
Default:
'NONE'
Some overflows have well-defined semantics in all C/C++ standard
versions. The typical example is UINT_MAX+1 which is
well-defined as 0 via wraparound. This differs from
INT_MAX+1 which is either undefined or implementation-defined
depending on the considered standard version. Most CPUs will compute
INT_MIN but this wraparound is not guaranteed by any C/C++
standard.
Both cases are overflows and are reported by this rule. However, one might want to suppress messages for the well-defined cases. To suppress these activate this option.
Different C and C++ standard versions differ in what is well-defined, implementation-defined, or undefined. Luckily, if we only consider well-defined and do not discern between implementation-defined and undefined, we end up with only two groups: pre-C++20 and since-C++20.
use_error_number¶
use_error_number : bool = False
use_rule_severity¶
use_rule_severity : bool = True
Option Types¶
These types are used by options listed above:
RelevantExpressions¶
An enumeration.none
No (additional) checks for overflows in const-expressions or compile time constant expressions.const_expressions_only
Whether the analysis should statically check const-expressions (i.e., const variables and literals) that might have been reduced to a literal during compilation.const_and_compile_time_constant
Whether the analysis should statically check const-expressions (i.e., const variables and literals) as well as compile-time constant expressions (i.e., preprocessor defines, constexprs or literals) that might have been reduced to a literal during compilation.SuppressionMode¶
An enumeration.NONE
Suppress nothing.
PRE_CPP2020
Suppress findings that are well-defined before C++20. These are:
- Over- and underflows of unsigned integers during addition, subtraction, and multiplication
- Conversions from unsigned to unsigned integers
- Wrap-around caused by left-shifting of unsigned integer
CPP2020
Suppress findings that are well-defined since C++20. These are:
- Over- and underflows of unsigned integers during addition, subtraction, and multiplication
- Conversions between signed and unsigned integers
- Wrap-around caused by left-shifting
- Shifting negative integers
Surprising mechanics of C++20 signed narrow integers
Since C++20, casts between signed and unsigned are defined as two-complement wrap-around. Overflows of signed integers are still undefined behavior and are reported by this rule. But, due to integer promotion rules, certain expressions are computed using wider integer types, which can lead to the false impression that this is no longer the case, because no overflow findings are reported there.
Suppose, that the code is compiled on a platform where short
is smaller than or equal to half the size of an int. Very
commonly the sizes are 2 and 4. This assumption is thus true for many
platforms.
In this case, narrow signed integer types such as short or
signed char are first implicitly promoted to int
before the arithmetic operation is executed. Because of this promotion, the
actual operation does not overflow and is thus well-defined. After the
operation, an implicit cast is performed to the narrower type. This cast is
well-defined in C++20 as wrapping around.
Consider the following snippet:
static_assert(sizeof(short) == 2);
static_assert(sizeof(int) == 4);
short a = 0x1000;
short b = 0x1001;
short c = a*b;
C++20 defines c as 0x1000. The reason is that
a*b is implicitly promoted to static_cast<int>
(a)*static_cast<int>(b). After the promotion, the
multiplication does not overflow and yields a well-defined
0x1001000. This number is then implicitly cast to
0x1000 which is also a well-defined operation.
An analogous effect can be observed for signed short addition and
multiplication. Another effect is that it is well-defined to shift by up to
as many bits as int has even if the shifted integer has fewer
bits.
DERIVE_FROM_IR
Derive the language version from the IR compilation flags and suppress findings accordingly.