CUDA-1.1¶
Use CUDA after host program initiation and before host program termination
Required inputs: IR
CUDA 1.1 [device-dependent] Use CUDA after host program initiation and before host program termination
Perform the following operations only after host program initiation and before host program termination
(not before or after main):
- Call device-dependent CUDA runtime or driver interfaces.
- Launch CUDA kernels.
- ODR-use a managed storage duration object (e.g.
__managed__, CUDA C++ only).
Most CUDA runtime and driver interfaces are device-dependent, with some exceptions such as:
cudaGetErrorStringandcudaGetErrorName.cuGetErrorStringandcuGetErrorName.cuDriverGetVersion.
Scope: Host.
Audience: CUDA C++, CUDA Libraries.
Category: Mandatory.
Hardware Applicability: All Compute Capabilities.
Rationale
Most CUDA runtime and driver interfaces are dependent on the existence of valid internal device state, which is lazily initialized during host program initiation and destroyed during host program termination. The CUDA runtime and driver cannot detect if this device state is invalid, so using any of these interfaces (implicitly or explicitly) during program initiation or termination will result in undefined behavior. This means that most uses of the CUDA runtime or driver in the global constructors or destructors of C++ classes in the host program is undefined behavior.
Example 1 (Bad)
# include <cstdio> # include <cassert> # include <cuda.h> __global__ void hello_world() { printf("hello world\n"); } struct launch_hello_world { __host__ launch_hello_world() { hello_world<<<1, 1>>>(); cudaError_t const error0 = cudaGetLastError(); assert(cudaSuccess == error0); } __host__ ~launch_hello_world() { hello_world<<<1, 1>>>(); cudaError_t const error0 = cudaGetLastError(); assert(cudaSuccess == error0); } }; // This object's constructor is run during program initiation and attempts to // launch a kernel, which is invalid. Likewise, the object's destructor is run // during program termination and attempts to launch a kernel, which is invalid. launch_hello_world launcher; int main() {}
Example 2 (Bad)
# include <cassert> # include <cuda.h> struct device_storage { char* ptr; __host__ device_storage(int64_t bytes) { cudaError_t const error = cudaMalloc(&ptr, bytes); assert(cudaSuccess == error); } __host__ ~device_storage() { cudaError_t const error = cudaFree(ptr); assert(cudaSuccess == error); } }; // This object's constructor is run during program initiation and attempts to // call an initialization-dependent CUDA runtime function, which is invalid. // Likewise, the object's destructor is run during program termination and // attempts to call an initialization-dependent CUDA runtime function, which is // invalid. device_storage scratch(128); int main() {}
Example 3 (Bad)
__managed__ int32_t global_object_count = 0; struct A { __host__ A() { ++global_object_count; } __host__ ~A() { --global_object_count; } }; // This object's constructor is run during program initiation and attempts to // increment a `__managed__` object, which is invalid. Likewise, the object's // destructor is run during program termination and attempts to decrement a // `__managed__` object, which is invalid. A a; int main() {}Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.
Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
cuda_api_outside_main |
Calling a device-dependent CUDA runtime or driver interface in initialization or finalization |
None |
False |
cuda_managed_outside_main |
CUDA managed object odr-used in initialization or finalization |
None |
False |
kernel_launch_outside_main |
CUDA kernel launch in initialization or finalization |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
allowed_cuda_functions¶
allowed_cuda_functions
Names of functions in the CUDA API which can be called during global initialization and finalization.Type: set[bauhaus.analysis.config.QualifiedName]
Default:
{'cuDriverGetVersion', 'cuGetErrorName', 'cuGetErrorString', 'cudaGetErrorName', 'cudaGetErrorString'}
report_at_call¶
report_at_call : bool = False