CUDA-1.4¶
Check for errors after launching a kernel by calling cudaGetLastError
Required inputs: IR
CUDA 1.4 [error.kernel-launch] Check for errors after launching a kernel by calling cudaGetLastError
After launching a kernel with the CUDA C++ kernel launch syntax, call cudaGetLastError and check
whether its return value indicates that a synchronous error occurred before the next CUDA API call.
Scope: Host, Device.
Audience: CUDA C++.
Category: Advisory.
Hardware Applicability: All Compute Capabilities.
Rationale
CUDA C++ kernel launches return void, but reports errors by setting the CUDA global error state, which
can be checked with cudaGetLastError. There are two classes of errors that an interface can return:
- Synchronous errors, which are the immediate result of the call to the interface that returned the error, and
- Asynchronous errors, which are the result of some unspecified previous asynchronous operation.
Asynchronous errors are "sticky", meaning that they will continue to be returned by subsequent calls to the CUDA runtime or driver. Error codes report the class of error that occurred, but do not contain any information about which operation caused the error. If a program fails to check whether a kernel launch returned an error, either:
- No diagnostic will be produced for the error if the error was synchronous.
- A subsequent interface call will return the error if the error was asynchronous.
This may make it unclear where the asynchronous error originated from.
Example 1 (Bad)
# include <cassert> # include "testTerminate.h" __global__ void fail() {} __global__ void pass() {} int main() { fail<<<0, 1>>>(); // Fails synchronously due to invalid grid dimensions. pass<<<1, 0>>>(); // Succeeds, overwriting the previously error which was // synchronous and thus is not "sticky". cudaError_t const error0 = cudaGetLastError(); testTerminate(error0); // This synchronization succeeds. The failure of the first kernel is silently // ignored. cudaError_t const error1 = cudaDeviceSynchronize(); assert(cudaSuccess == error1); }
Example 2 (Good)
# include <cassert> # include "testTerminate.h" __global__ void fail() {} __global__ void pass() {} int main() { // Fails synchronously due to invalid grid dimensions. fail<<<0, 1>>>(); // The kernel launch failure is detected by checking the CUDA global error // state. cudaError_t const error0 = cudaGetLastError(); testTerminate(error0); pass<<<1, 0>>>(); // Succeeds, overwriting the previously error which was // synchronous and thus is not "sticky". cudaError_t const error1 = cudaGetLastError(); testTerminate(error1); cudaError_t const error2 = cudaDeviceSynchronize(); assert(cudaSuccess == error2); }Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.
Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
missing_error_check_after_kernel |
Missing check for errors (by calling cudaGetLastError) after launching a kernel |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
error_checking_functions¶
error_checking_functions : set[bauhaus.analysis.config.FunctionName] = {'cudaGetLastError', 'cudaPeekAtLastError'}