CWE-119

Improper Restriction of Operations within the Bounds of a Memory Buffer. [Improper-Control-Of-A-Resource-Through-Its-Lifetime, Top25-2024-20]

Required inputs: IR, StaticSemanticAnalysis

The product performs operations on a memory buffer, but it can read from or write to a memory location that is outside of the intended boundary of the buffer.

Certain languages allow direct addressing of memory locations and do not automatically ensure that these locations are valid for the memory buffer that is being referenced. This can cause read or write operations to be performed on memory locations that may be associated with other variables, data structures, or internal program data.

As a result, an attacker may be able to execute arbitrary code, alter the intended control flow, read sensitive information, or cause the system to crash.

Demonstrative Examples
Example 1

This example takes an IP address from a user, verifies that it is well formed and then looks up the hostname and copies it into a buffer.

Example Language:C
    void host_lookup(char *user_supplied_addr){
        struct hostent *hp;
        in_addr_t *addr;
        char hostname[64];
        in_addr_t inet_addr(const char *cp);

        /*routine that ensures user_supplied_addr is in the right format for conversion */

        validate_addr_form(user_supplied_addr);
        addr = inet_addr(user_supplied_addr);
        hp = gethostbyaddr( addr, sizeof(struct in_addr), AF_INET);
        strcpy(hostname, hp->h_name);
    }

This function allocates a buffer of 64 bytes to store the hostname, however there is no guarantee that the hostname will not be larger than 64 bytes. If an attacker specifies an address which resolves to a very large hostname, then the function may overwrite sensitive data or even relinquish control flow to the attacker.

Note that this example also contains an unchecked return value (CWE-252) that can lead to a NULL pointer dereference (CWE-476).

Example 2

This example applies an encoding procedure to an input string and stores it into a buffer.

Example Language:C
    char * copy_input(char *user_supplied_string){
        int i, dst_index;
        char *dst_buf = (char*)malloc(4*sizeof(char) * MAX_SIZE);
        if ( MAX_SIZE <= strlen(user_supplied_string) ){
            die("user string too long, die evil hacker!");
        }
        dst_index = 0;
        for ( i = 0; i < strlen(user_supplied_string); i++ ){
            if( '&' == user_supplied_string[i] ){
                dst_buf[dst_index++] = '&';
                dst_buf[dst_index++] = 'a';
                dst_buf[dst_index++] = 'm';
                dst_buf[dst_index++] = 'p';
                dst_buf[dst_index++] = ';';
            }
            else if ('<' == user_supplied_string[i] ){
                /* encode to &lt; */
            }
            else dst_buf[dst_index++] = user_supplied_string[i];
        }
        return dst_buf;
    }

The programmer attempts to encode the ampersand character in the user-controlled string, however the length of the string is validated before the encoding procedure is applied. Furthermore, the programmer assumes encoding expansion will only expand a given character by a factor of 4, while the encoding of the ampersand expands by 5. As a result, when the encoding procedure expands the string it is possible to overflow the destination buffer if the attacker provides a string of many ampersands.

Example 3

The following example asks a user for an offset into an array to select an item.

Example Language:C
    int main (int argc, char **argv) {
        char *items[] = {"boat", "car", "truck", "train"};
        int index = GetUntrustedOffset();
        printf("You selected %s\n", items[index-1]);
    }

The programmer allows the user to specify which element in the list to select, however an attacker can provide an out-of-bounds offset, resulting in a buffer over-read (CWE-126).

Example 4

In the following code, the method retrieves a value from an array at a specific array index location that is given as an input parameter to the method

Example Language:C
    int getValueFromArray(int *array, int len, int index) {
        int value;

        // check that the array index is less than the maximum

        // length of the array
        if (index < len) {
            // get the value at the specified index of the array
            value = array[index];
        }
        // if array index is invalid then output error message
        // and return value indicating error
        else {
            printf("Value is: %d\n", array[index]);
            value = -1;
        }

        return value;
    }

However, this method only verifies that the given array index is less than the maximum length of the array but does not check for the minimum value (CWE-839). This will allow a negative value to be accepted as the input array index, which will result in a out of bounds read (CWE-125) and may allow access to sensitive memory. The input array index should be checked to verify that is within the maximum and minimum range required for the array (CWE-129). In this example the if statement should be modified to include a minimum range check, as shown below.

Example Language:C
    ...

    // check that the array index is within the correct

    // range of values for the array
    if (index >= 0 && index < len) {

    ...
Example 5

Windows provides the _mbs family of functions to perform various operations on multibyte strings. When these functions are passed a malformed multibyte string, such as a string containing a valid leading byte followed by a single null byte, they can read or write past the end of the string buffer causing a buffer overflow. The following functions all pose a risk of buffer overflow: _mbsinc _mbsdec _mbsncat _mbsncpy _mbsnextc _mbsnset _mbsrev _mbsset _mbsstr _mbstok _mbccpy _mbslen

Excerpts from CWE [https://cwe.mitre.org], Copyright (C) 2006-2026, the MITRE Corporation. See section 9.4. "3rd-Party Licenses" in the documentation for full details.

Possible Messages

Key

Text

Severity

Disabled

arithmetic_out_of_bounds

Pointer arithmetic on {node0} might create pointer outside array bounds of {name0}

None

False

assigned_to_pointer_to_const

Assigning the address of a partially initialized variable to some pointer-to-const

None

False

double_free

Dynamic memory released here was already released earlier

None

False

out_of_bounds

Access into array is out of bounds

None

False

pass_as_pointer_to_const_param

Passing uninitialized variable by pointer as function parameter with pointer-to-const type

None

False

possible_double_free

Dynamic memory released here possibly already released earlier

None

False

possible_indirect_out_of_bounds

Pointer-indirect access through {node0} might be out of bounds accessing {name0}

None

False

possible_invalid_call_argument

Call to {} with string buffer argument {} that possibly has no valid null delimiter character.

None

False

possible_out_of_bounds

Access into array might be out of bounds

None

False

possible_return_value_uninit

Function return value is potentially not initialized

None

False

possible_uninit

Use of possibly uninitialized variable

None

False

possible_use_after_free

Dynamic memory possibly used after it was previously released

None

False

possible_write_beyond_argument

Call to {} might result in a write access beyond the bounds of argument {}, since argument {} might be too large.

None

False

possibly_initialized

Use of possibly uninitialized variable (previous call {node0} might have initialized the variable)

None

False

return_value_uninit

Function return value is not initialized

None

False

undereferenced_arithmetic_out_of_bounds

Pointer arithmetic on {node0} might create pointer one past the end of {name0} (but not dereferenced)

None

False

undereferenced_out_of_bounds

Access is one past the end of the array (but not dereferenced)

None

False

undereferenced_possible_indirect_out_of_bounds

Pointer-indirect access through {node0} might be one past the end accessing {name0} (but not dereferenced)

None

False

undereferenced_possible_out_of_bounds

Access might be one past the end of the array (but not dereferenced)

None

False

uninit

Use of uninitialized variable

None

False

use_after_free

Dynamic memory used after it was previously released

None

False

Options

abstract_interpretation_out_of_bounds

abstract_interpretation_out_of_bounds : bool = False

Use additional "symbolic expression analysis" as postprocessing step. This can remove false positives, but might require more time. Option is automatically active if option StaticSemanticAnalysis/performance.general.enhanced_analysis is active.
 

additional_local_array_check

additional_local_array_check : bool = True

Invoke an additional analysis that tries to remove false positives involving accesses to local array variables and in particular their initialization. The analysis attempts to report only the first use of an uninitialized value. Consider e.g. the following example:
    int example()
    {
        int a[10];
        int b[20];
        int uninit_var;
        for (int i = 0; i < 10; ++i)
        {
L1:         a[i] = uninit_var; // use of uninit_var reported
            b[i] = i;
        }
        int result = a[3]; // not reported, since already reported at L1
        result += b[15]; // reported; c[] is not (completely) initialized
        return result;
    }
    
 

assume_globals_are_initialized

assume_globals_are_initialized : bool = True

Whether global and local static variables should be treated as initialized (as specified by the language).
 

check_array_access_with_unknown_index

check_array_access_with_unknown_index : bool = False

Whether array accesses like a[i] with non-literal index i should be checked as well.
 

concat_operations

concat_operations

Type: dict[bauhaus.analysis.config.QualifiedName, typing.Tuple[int, int]]

Default:

{
   'strcat': (0, 1)
}
Names of buffer-concatenating functions being relevant as call targets for this check, with the position of the argument pointing to the destination buffer, and the position of the argument that references the buffer that should be appended at the end of the destination buffer.
 

copy_operations

copy_operations

Type: dict[bauhaus.analysis.config.QualifiedName, typing.Tuple[int, int]]

Default:

{
   'strcpy': (0, 1)
}
Names of buffer copy functions being relevant as call targets for this check, with the position of the destination argument and the source argument of the buffer copy operation.
 

delimiter_of_arguments

delimiter_of_arguments

Type: dict[bauhaus.analysis.config.QualifiedName, set[int]]

Default:

{
   'strcat': {0, 1},
   'strchr': {0},
   'strcmp': {0, 1},
   'strcoll': {0, 1},
   'strcpy': {1},
   'strcspn': {0, 1},
   'strlen': {0},
   'strpbrk': {0, 1},
   'strrchr': {0},
   'strspn': {0, 1},
   'strstr': {0, 1},
   'strtok': {0, 1}
}
Names of functions being relevant as call targets for this check, with the position of parameters whose referenced buffers should be checked for being properly terminated by a null terminator.
 

exclude_from_pointer_to_const_param_check

exclude_from_pointer_to_const_param_check : set[bauhaus.analysis.config.QualifiedName] = {'__builtin_object_size'}

Names of routines whose parameters should be excluded from the check for passing uninitialized variables by pointer as parameter with pointer-to-const type.
 

exclude_very_high_indices

exclude_very_high_indices : bool = True

Enables heuristic to detect false positives: When index used for array access is very high in comparison to the array's size, assume false positive.
 

exclude_warnings_for_unknown_arguments

exclude_warnings_for_unknown_arguments : bool = False

Exclude warnings for cases where nothing at all is known about the arguments of an operation, caused e.g. by using return values of external routines.
 

functions_with_ignored_deallocators

functions_with_ignored_deallocators : set[str] = set()

Set of functions (given by their qualified name) where all deallocators are ignored. For these functions, the check will never report a use-after-free. It will also assume that these functions never create freed pointers, neither by return value, out param, nor by modifying global state.
 

ignore_calls_in_functions

ignore_calls_in_functions : set[bauhaus.analysis.config.QualifiedName] = set()

Qualified names of function definitions in which calls to relevant functions are ignored for this check.
 

report_freed_this_at_call

report_freed_this_at_call : bool = False

This option controls findings when a freed pointer is used in C++ to call a non-static member function. When set to true, the use at the call is directly reported. When false, the analysis waits for an actual dereference (of the this-pointer then) inside the callee, and only reports those.
 

report_read_pointer_args_in_calls_to_undefined

report_read_pointer_args_in_calls_to_undefined : bool = True

Report when freed pointers are passed to undefined (external) functions.
 

report_unbounded_arrays

report_unbounded_arrays : bool = False

If true, accesses into arrays with unknown bound are reported as being potentially outside the allowed range. This affects arrays like extern char buf[];.
 

report_undereferenced_one_past_the_end

report_undereferenced_one_past_the_end : bool = False

If true, report accesses one past the end of an array even if there is no dereference of the resulting pointer.
 

report_unknown_index

report_unknown_index : bool = False

If false, do not report possible out-of-bound findings for which the analysis was not able to infer any restricting information about the array index (this can lead to excluding both false positives and true findings).
 

resources

resources

Type: set[str]

Default: {'C++ArrayHeapMemory', 'C++HeapMemory', 'CudaAsyncMemory', 'CudaDeviceMemory', 'CudaDriverAsyncMemory', 'CudaHostMemory', 'CudaManagedMemory', 'FileHandle', 'HeapMemory', 'UniquePtrHeapMemory'}

Set of resources to be checked (selection of rules in the Resources group).
 

track_conditional_initialization

track_conditional_initialization : bool = True

Whether higher precision should be used to eliminate cases where the initialization and the access are controlled by conditions in a way that the variable access is only executed when the initialization was executed. Requires more memory and runtime but can eliminate some false positives.
 

use_semantic_analysis

use_semantic_analysis : bool = True

When enabled, use semantic analysis. Otherwise filter uninitialized variable messages from the compiler.
 

witness_paths

witness_paths : bool = True

Whether witness paths should be determined and included in the issue.
 

writing_into_pointer_to_const

writing_into_pointer_to_const

Type: dict[bauhaus.analysis.config.QualifiedName, int]

Default:

{
   'cudaMemcpyToSymbol': 0
}
Names of routines (mapping to parameter index, starting at 0) having a parameter declared as pointer-to-const yet they are still writing into the pointee.