CUDA-1.1

Use CUDA after host program initiation and before host program termination

Required inputs: IR

CUDA 1.1 [device-dependent] Use CUDA after host program initiation and before host program termination

Perform the following operations only after host program initiation and before host program termination (not before or after main):

  • Call device-dependent CUDA runtime or driver interfaces.
  • Launch CUDA kernels.
  • ODR-use a managed storage duration object (e.g. __managed__, CUDA C++ only).

Most CUDA runtime and driver interfaces are device-dependent, with some exceptions such as:

  • cudaGetErrorString and cudaGetErrorName.
  • cuGetErrorString and cuGetErrorName.
  • cuDriverGetVersion.
Scope: Host.
Audience: CUDA C++, CUDA Libraries.
Category: Mandatory.
Hardware Applicability: All Compute Capabilities.
Rationale

Most CUDA runtime and driver interfaces are dependent on the existence of valid internal device state, which is lazily initialized during host program initiation and destroyed during host program termination. The CUDA runtime and driver cannot detect if this device state is invalid, so using any of these interfaces (implicitly or explicitly) during program initiation or termination will result in undefined behavior. This means that most uses of the CUDA runtime or driver in the global constructors or destructors of C++ classes in the host program is undefined behavior.

Example 1 (Bad)
# include <cstdio>
# include <cassert>
# include <cuda.h>

__global__ void hello_world() {
  printf("hello world\n");
}

struct launch_hello_world {
  __host__ launch_hello_world() {
    hello_world<<<1, 1>>>();
    cudaError_t const error0 = cudaGetLastError();
    assert(cudaSuccess == error0);
  }
  __host__ ~launch_hello_world() {
    hello_world<<<1, 1>>>();
    cudaError_t const error0 = cudaGetLastError();
    assert(cudaSuccess == error0);
  }
};
// This object's constructor is run during program initiation and attempts to
// launch a kernel, which is invalid. Likewise, the object's destructor is run
// during program termination and attempts to launch a kernel, which is invalid.
launch_hello_world launcher;

int main() {}
Example 2 (Bad)
# include <cassert>
# include <cuda.h>

struct device_storage {
  char* ptr;
  __host__ device_storage(int64_t bytes) {
    cudaError_t const error = cudaMalloc(&ptr, bytes);
    assert(cudaSuccess == error);
  }
  __host__ ~device_storage() {
    cudaError_t const error = cudaFree(ptr);
    assert(cudaSuccess == error);
  }
};
// This object's constructor is run during program initiation and attempts to
// call an initialization-dependent CUDA runtime function, which is invalid.
// Likewise, the object's destructor is run during program termination and
// attempts to call an initialization-dependent CUDA runtime function, which is
// invalid.
device_storage scratch(128);

int main() {}
Example 3 (Bad)
__managed__ int32_t global_object_count = 0;

struct A {
  __host__ A() {
    ++global_object_count;
  }
  __host__ ~A() {
    --global_object_count;
  }
};
// This object's constructor is run during program initiation and attempts to
// increment a `__managed__` object, which is invalid. Likewise, the object's
// destructor is run during program termination and attempts to decrement a
// `__managed__` object, which is invalid.
A a;

int main() {}
Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.

Possible Messages

Key

Text

Severity

Disabled

cuda_api_outside_main

Calling a device-dependent CUDA runtime or driver interface in initialization or finalization

None

False

cuda_managed_outside_main

CUDA managed object odr-used in initialization or finalization

None

False

kernel_launch_outside_main

CUDA kernel launch in initialization or finalization

None

False

Options

allowed_cuda_functions

allowed_cuda_functions

Type: set[bauhaus.analysis.config.QualifiedName]

Default: {'cuDriverGetVersion', 'cuGetErrorName', 'cuGetErrorString', 'cudaGetErrorName', 'cudaGetErrorString'}

Names of functions in the CUDA API which can be called during global initialization and finalization.
 

report_at_call

report_at_call : bool = False

If set to True, the error is reported at the call-sites of routines using CUDA (e.g., at the global variables where some constructor/destructor is called) rather than at the use itself.