CUDASecurity-ARR01¶
Do not form or use out-of-bounds array subscripts for CUDA kernels
Required inputs: IR, StaticSemanticAnalysis
This rule prevents accessing array elements beyond their allocated bounds within CUDA kernels.
Scope: Host, Device.
Audience: CUDA C++, CUDA Libraries.
Category: Required.
Hardware Applicability: All Compute Capabilities.
Rationale
Out-of-bounds array access in CUDA kernels can lead to undefined behavior, memory corruption, segmentation faults, or silent data corruption. The rule checks that array subscripts are properly validated against array dimensions before use, especially when using thread indices (threadIdx, blockIdx, blockDim, gridDim) to calculate array positions.
Example 1 (Good)
__global__ void vectorAdd(float* a, float* b, float* c, int n)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Proper bounds checking before array access
if (idx >= 0 && idx < n)
{
c[idx] = a[idx] + b[idx];
}
}
int main()
{
const int N = 1000;
float *d_a, *d_b, *d_c;
// Allocate GPU memory
cudaMalloc(&d_a, N * sizeof(float));
cudaMalloc(&d_b, N * sizeof(float));
cudaMalloc(&d_c, N * sizeof(float));
// Launch kernel with proper grid/block sizing
int blockSize = 256;
int numBlocks = (N + blockSize - 1) / blockSize;
vectorAdd<<>>(d_a, d_b, d_c, N);
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
return 0;
}
Example 2 (Bad)
__global__ void vectorAdd(float* a, float* b, float* c, int n)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// BAD: No bounds checking - may access beyond array limits
c[idx] = a[idx] + b[idx];
}
__global__ void processArray(int* data, int size)
{
int tid = threadIdx.x;
// BAD: Assumes array has at least 1024 elements
data[tid + 1024] = data[tid] * 2;
// BAD: Using hardcoded offset without validation
if (tid > 0)
{
data[tid - 1] = data[tid + 512]; // May exceed bounds
}
}
int main()
{
const int N = 1000;
float *d_a, *d_b, *d_c;
int* d_data;
cudaMalloc(&d_a, N * sizeof(float));
cudaMalloc(&d_b, N * sizeof(float));
cudaMalloc(&d_c, N * sizeof(float));
cudaMalloc(&d_data, 500 * sizeof(int)); // Only 500 elements allocated
// BAD: Grid might create more threads than array elements
int blockSize = 256;
int numBlocks = 8; // Could launch 2048 threads for 1000 elements
vectorAdd<<>>(d_a, d_b, d_c, N);
// BAD: Kernel assumes 1024+ elements but only 500 allocated
processArray<<<1, 256>>>(d_data, 500);
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
cudaFree(d_data);
return 0;
}
Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
arithmetic_out_of_bounds |
Pointer arithmetic on {node0} might create pointer outside array bounds of {name0} |
None |
False |
out_of_bounds |
Access into array is out of bounds |
None |
False |
out_of_bounds_nonarray |
Pointer arithmetic on {node0} possibly operates on non-array target {name0} |
None |
False |
out_of_bounds_nonarray_arithmetic |
Pointer arithmetic on {node0} possibly creates pointer outside of non-array target {name0} |
None |
False |
out_of_bounds_nonarray_arithmetic_undereferenced |
Pointer arithmetic on {node0} possibly creates pointer one past non-array target {name0} (but not dereferenced) |
None |
False |
possible_indirect_out_of_bounds |
Pointer-indirect access through {node0} might be out of bounds accessing {name0} |
None |
False |
possible_out_of_bounds |
Access into array might be out of bounds |
None |
False |
undereferenced_arithmetic_out_of_bounds |
Pointer arithmetic on {node0} might create pointer one past the end of {name0} (but not dereferenced) |
None |
False |
undereferenced_out_of_bounds |
Access is one past the end of the array (but not dereferenced) |
None |
False |
undereferenced_possible_indirect_out_of_bounds |
Pointer-indirect access through {node0} might be one past the end accessing {name0} (but not dereferenced) |
None |
False |
undereferenced_possible_out_of_bounds |
Access might be one past the end of the array (but not dereferenced) |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
abstract_interpretation_out_of_bounds¶
abstract_interpretation_out_of_bounds : bool = False
distinguish_nonarray_access_kinds¶
distinguish_nonarray_access_kinds : bool = False
out_of_bounds_nonarray. If true, it differentiates
these findings using the message kinds
out_of_bounds_nonarray_arithmetic (for cases where possibly an invalid
pointer is formed), out_of_bounds_nonarray_arithmetic_undereferenced
(pointer one past target object is formed, but not dereferenced), and
out_of_bounds_nonarray (remaining cases where valid pointers are
formed).
exclude_very_high_indices¶
exclude_very_high_indices : bool = True
nonarray_only_report_invalid_pointers¶
nonarray_only_report_invalid_pointers : bool = False
out_of_bounds_nonarray (if option
distinguish_nonarray_access_kinds
is false) or messages of kind out_of_bounds_nonarray_arithmetic
(otherwise).
report_unbounded_arrays¶
report_unbounded_arrays : bool = False
extern char buf[];.
report_undereferenced_one_past_the_end¶
report_undereferenced_one_past_the_end : bool = False
report_unknown_index¶
report_unknown_index : bool = False