CUDA-1.4

Check for errors after launching a kernel by calling cudaGetLastError

Required inputs: IR

CUDA 1.4 [error.kernel-launch] Check for errors after launching a kernel by calling cudaGetLastError

After launching a kernel with the CUDA C++ kernel launch syntax, call cudaGetLastError and check whether its return value indicates that a synchronous error occurred before the next CUDA API call.

Scope: Host, Device.
Audience: CUDA C++.
Category: Advisory.
Hardware Applicability: All Compute Capabilities.
Rationale

CUDA C++ kernel launches return void, but reports errors by setting the CUDA global error state, which can be checked with cudaGetLastError. There are two classes of errors that an interface can return:

  • Synchronous errors, which are the immediate result of the call to the interface that returned the error, and
  • Asynchronous errors, which are the result of some unspecified previous asynchronous operation.

Asynchronous errors are "sticky", meaning that they will continue to be returned by subsequent calls to the CUDA runtime or driver. Error codes report the class of error that occurred, but do not contain any information about which operation caused the error. If a program fails to check whether a kernel launch returned an error, either:

  • No diagnostic will be produced for the error if the error was synchronous.
  • A subsequent interface call will return the error if the error was asynchronous.

This may make it unclear where the asynchronous error originated from.

Example 1 (Bad)
# include <cassert>
# include "testTerminate.h"

__global__ void fail() {}

__global__ void pass() {}

int main() {
  fail<<<0, 1>>>(); // Fails synchronously due to invalid grid dimensions.
  pass<<<1, 0>>>(); // Succeeds, overwriting the previously error which was
// synchronous and thus is not "sticky".
  cudaError_t const error0 = cudaGetLastError();
  testTerminate(error0);
  // This synchronization succeeds. The failure of the first kernel is silently
  // ignored.
  cudaError_t const error1 = cudaDeviceSynchronize();
  assert(cudaSuccess == error1);
}
Example 2 (Good)
# include <cassert>
# include "testTerminate.h"

__global__ void fail() {}

__global__ void pass() {}

int main() {
  // Fails synchronously due to invalid grid dimensions.
  fail<<<0, 1>>>();
  // The kernel launch failure is detected by checking the CUDA global error
  // state.
  cudaError_t const error0 = cudaGetLastError();
  testTerminate(error0);
  pass<<<1, 0>>>(); // Succeeds, overwriting the previously error which was
// synchronous and thus is not "sticky".
  cudaError_t const error1 = cudaGetLastError();
  testTerminate(error1);
  cudaError_t const error2 = cudaDeviceSynchronize();
  assert(cudaSuccess == error2);
}
Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.

Possible Messages

Key

Text

Severity

Disabled

missing_error_check_after_kernel

Missing check for errors (by calling cudaGetLastError) after launching a kernel

None

False

Options

error_checking_functions

error_checking_functions : set[bauhaus.analysis.config.FunctionName] = {'cudaGetLastError', 'cudaPeekAtLastError'}

Names of functions that are sufficient to check for errors after launching a kernel.