CWE-787¶

Out-of-bounds Write. [Memory-Buffer-Errors, Improper-Control-Of-A-Resource-Through-Its-Lifetime, Top25-2024-2]

Required inputs: IR, StaticSemanticAnalysis

The product writes data past the end, or before the beginning, of the intended buffer. Typically, this can result in corruption of data, a crash, or code execution. The product may modify an index or perform pointer arithmetic that references a memory location that is outside of the boundaries of the buffer. A subsequent write operation then produces undefined or unexpected results.

Demonstrative Examples

Example 1

The following code attempts to save four different identification numbers into an array.

Example Language:C
    int id_sequence[3];

    /* Populate the id array. */

    id_sequence[0] = 123;
    id_sequence[1] = 234;
    id_sequence[2] = 345;
    id_sequence[3] = 456;

Since the array is only allocated to hold three elements, the valid indices are 0 to 2; so, the assignment to id_sequence[3] is out of bounds.

Example 2

In the following code, it is possible to request that memcpy move a much larger segment of memory than assumed:

Example Language:C
    int returnChunkSize(void *) {
        /* if chunk info is valid, return the size of usable memory,

        * else, return -1 to indicate an error

        */
        ...
    }
    int main() {
        ...
        memcpy(destBuf, srcBuf, (returnChunkSize(destBuf)-1));
        ...
    }

If returnChunkSize() happens to encounter an error it will return -1. Notice that the return value is not checked before the memcpy operation (CWE-252), so -1 can be passed as the size argument to memcpy() (CWE-805). Because memcpy() assumes that the value is unsigned, it will be interpreted as MAXINT-1 (CWE-195), and therefore will copy far more memory than is likely available to the destination buffer (CWE-787, CWE-788).

Example 3

This code takes an IP address from the user and verifies that it is well formed. It then looks up the hostname and copies it into a buffer.

Example Language:C
    void host_lookup(char *user_supplied_addr){
        struct hostent *hp;
        in_addr_t *addr;
        char hostname[64];
        in_addr_t inet_addr(const char *cp);

        /*routine that ensures user_supplied_addr is in the right format for conversion */

        validate_addr_form(user_supplied_addr);
        addr = inet_addr(user_supplied_addr);
        hp = gethostbyaddr( addr, sizeof(struct in_addr), AF_INET);
        strcpy(hostname, hp->h_name);
    }

This function allocates a buffer of 64 bytes to store the hostname. However, there is no guarantee that the hostname will not be larger than 64 bytes. If an attacker specifies an address which resolves to a very large hostname, then the function may overwrite sensitive data or even relinquish control flow to the attacker.

Note that this example also contains an unchecked return value (CWE-252) that can lead to a NULL pointer dereference (CWE-476).

Example 4

This code applies an encoding procedure to an input string and stores it into a buffer.

Example Language:C
    char * copy_input(char *user_supplied_string){
        int i, dst_index;
        char *dst_buf = (char*)malloc(4*sizeof(char) * MAX_SIZE);
        if ( MAX_SIZE <= strlen(user_supplied_string) ){
            die("user string too long, die evil hacker!");
        }
        dst_index = 0;
        for ( i = 0; i < strlen(user_supplied_string); i++ ){
            if( '&' == user_supplied_string[i] ){
                dst_buf[dst_index++] = '&';
                dst_buf[dst_index++] = 'a';
                dst_buf[dst_index++] = 'm';
                dst_buf[dst_index++] = 'p';
                dst_buf[dst_index++] = ';';
            }
            else if ('<' == user_supplied_string[i] ){
                /* encode to &lt; */
            }
            else dst_buf[dst_index++] = user_supplied_string[i];
        }
        return dst_buf;
    }

The programmer attempts to encode the ampersand character in the user-controlled string. However, the length of the string is validated before the encoding procedure is applied. Furthermore, the programmer assumes encoding expansion will only expand a given character by a factor of 4, while the encoding of the ampersand expands by 5. As a result, when the encoding procedure expands the string it is possible to overflow the destination buffer if the attacker provides a string of many ampersands.

Example 5

In the following C/C++ code, a utility function is used to trim trailing whitespace from a character string. The function copies the input string to a local character string and uses a while statement to remove the trailing whitespace by moving backward through the string and overwriting whitespace with a NUL character.

Example Language:C
    char* trimTrailingWhitespace(char *strMessage, int length) {
        char *retMessage;
        char *message = malloc(sizeof(char)*(length+1));

        // copy input string to a temporary string
        char message[length+1];
        int index;
        for (index = 0; index < length; index++) {
            message[index] = strMessage[index];
        }
        message[index] = '\0';

        // trim trailing whitespace
        int len = index-1;
        while (isspace(message[len])) {
            message[len] = '\0';
            len--;
        }

        // return string without trailing whitespace
        retMessage = message;
        return retMessage;
    }

However, this function can cause a buffer underwrite if the input character string contains all whitespace. On some systems the while statement will move backwards past the beginning of a character string and will call the isspace() function on an address outside of the bounds of the local buffer.

Example 6

The following code allocates memory for a maximum number of widgets. It then gets a user-specified number of widgets, making sure that the user does not request too many. It then initializes the elements of the array using InitializeWidget(). Because the number of widgets can vary for each request, the code inserts a NULL pointer to signify the location of the last widget.

Example Language:C
    int i;
    unsigned int numWidgets;
    Widget **WidgetList;

    numWidgets = GetUntrustedSizeValue();
    if ((numWidgets == 0) || (numWidgets > MAX_NUM_WIDGETS)) {
        ExitError("Incorrect number of widgets requested!");
    }
    WidgetList = (Widget **)malloc(numWidgets * sizeof(Widget *));
    printf("WidgetList ptr=%p\n", WidgetList);
    for(i=0; i<numWidgets; i++) {
        WidgetList[i] = InitializeWidget();
    }
    WidgetList[numWidgets] = NULL;
    showWidgets(WidgetList);

However, this code contains an off-by-one calculation error (CWE-193). It allocates exactly enough space to contain the specified number of widgets, but it does not include the space for the NULL pointer. As a result, the allocated buffer is smaller than it is supposed to be (CWE-131). So if the user ever requests MAX_NUM_WIDGETS, there is an out-of-bounds write (CWE-787) when the NULL is assigned. Depending on the environment and compilation settings, this could cause memory corruption.

Example 7

The following is an example of code that may result in a buffer underwrite. This code is attempting to replace the substring "Replace Me" in destBuf with the string stored in srcBuf. It does so by using the function strstr(), which returns a pointer to the found substring in destBuf. Using pointer arithmetic, the starting index of the substring is found.

Example Language:C
    int main() {
        ...
        char *result = strstr(destBuf, "Replace Me");
        int idx = result - destBuf;
        strcpy(&destBuf[idx], srcBuf);
        ...
    }

In the case where the substring is not found in destBuf, strstr() will return NULL, causing the pointer arithmetic to be undefined, potentially setting the value of idx to a negative number. If idx is negative, this will result in a buffer underwrite of destBuf.

Possible Messages

Key	Text	Severity	Disabled
arithmetic_out_of_bounds	Pointer arithmetic on {node0} might create pointer outside array bounds of {name0}	None	False
out_of_bounds	Access into array is out of bounds	None	False
possible_indirect_out_of_bounds	Pointer-indirect access through {node0} might be out of bounds accessing {name0}	None	False
possible_out_of_bounds	Access into array might be out of bounds	None	False
possible_write_beyond_argument	Call to {} might result in a write access beyond the bounds of argument {}, since argument {} might be too large.	None	False
undereferenced_arithmetic_out_of_bounds	Pointer arithmetic on {node0} might create pointer one past the end of {name0} (but not dereferenced)	None	False
undereferenced_out_of_bounds	Access is one past the end of the array (but not dereferenced)	None	False
undereferenced_possible_indirect_out_of_bounds	Pointer-indirect access through {node0} might be one past the end accessing {name0} (but not dereferenced)	None	False
undereferenced_possible_out_of_bounds	Access might be one past the end of the array (but not dereferenced)	None	False

Options¶

This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions

abstract_interpretation_out_of_bounds¶

abstract_interpretation_out_of_bounds : bool = False

Use additional "symbolic expression analysis" as postprocessing step. This can remove false positives, but might require more time. Option is automatically active if option StaticSemanticAnalysis/performance.general.enhanced_analysis is active.

concat_operations¶

concat_operations

Type: dict[bauhaus.analysis.config.QualifiedName, typing.Tuple[int, int]]

Default:
{
   'strcat': (0, 1)
}

Names of buffer-concatenating functions being relevant as call targets for this check, with the position of the argument pointing to the destination buffer, and the position of the argument that references the buffer that should be appended at the end of the destination buffer.

copy_operations¶

copy_operations

Type: dict[bauhaus.analysis.config.QualifiedName, typing.Tuple[int, int]]

Default:
{
   'strcpy': (0, 1)
}

Names of buffer copy functions being relevant as call targets for this check, with the position of the destination argument and the source argument of the buffer copy operation.

delimiter_of_arguments¶

delimiter_of_arguments

Type: dict[bauhaus.analysis.config.QualifiedName, set[int]]

Default:

{
   'strcat': {0, 1},
   'strchr': {0},
   'strcmp': {0, 1},
   'strcoll': {0, 1},
   'strcpy': {1},
   'strcspn': {0, 1},
   'strlen': {0},
   'strpbrk': {0, 1},
   'strrchr': {0},
   'strspn': {0, 1},
   'strstr': {0, 1},
   'strtok': {0, 1}
}

Names of functions being relevant as call targets for this check, with the position of parameters whose referenced buffers should be checked for being properly terminated by a null terminator.

exclude_very_high_indices¶

exclude_very_high_indices : bool = True

Enables heuristic to detect false positives: When index used for array access is very high in comparison to the array's size, assume false positive.

exclude_warnings_for_unknown_arguments¶

exclude_warnings_for_unknown_arguments : bool = False

Exclude warnings for cases where nothing at all is known about the arguments of an operation, caused e.g. by using return values of external routines.

ignore_calls_in_functions¶

ignore_calls_in_functions : set[bauhaus.analysis.config.QualifiedName] = set()

Qualified names of function definitions in which calls to relevant functions are ignored for this check.

report_unbounded_arrays¶

report_unbounded_arrays : bool = False

If true, accesses into arrays with unknown bound are reported as being potentially outside the allowed range. This affects arrays like extern char buf[];.

report_undereferenced_one_past_the_end¶

report_undereferenced_one_past_the_end : bool = False

If true, report accesses one past the end of an array even if there is no dereference of the resulting pointer.

report_unknown_index¶

report_unknown_index : bool = False

If false, do not report possible out-of-bound findings for which the analysis was not able to infer any restricting information about the array index (this can lead to excluding both false positives and true findings).

Axivion Suite 7.12.2-public

Navigation