CUDADirective-1.5¶

Callbacks should not wait for other callbacks

Required inputs: IR

CUDA DIRECTIVE 1.5 [synchronize.callback] Callbacks should not wait for other callbacks

In callbacks which happen in stream order, e.g. those created via cudaStreamAddCallback, cuStreamAddCallback, cudaLaunchHostFunc, cuLaunchHostFunc, cudaGraphAddHostNode, or cuGraphAddHostNode, do not wait for a condition that will be satisfied by another callback.

Scope: Host.

Audience: CUDA C++, CUDA Libraries.

Hardware Applicability: All Compute Capabilities.

Rationale

Callbacks registered with CUDA which happen in stream order are executed in unspecified threads of execu- tion which are owned by the CUDA runtime and driver. There execution may be serialized, even if there is not a mandated order between them. Therefore, a callback blocking on another callback may never complete, because the callback it is waiting for may not start executing until the blocking callback completes.

Example 1 (Bad)

# include <cassert>
# include <atomic>
# include <memory>
# include <cuda_runtime.h>
auto make_stream() {
  auto deleter = [] (CUstream_st* ptr) {
    cudaError_t const error0 = cudaStreamDestroy(ptr);
    assert(cudaSuccess == error0);
  };
  CUstream_st* raw_stream = nullptr;
  cudaError_t const error1 = cudaStreamCreate(&raw_stream);
  assert(cudaSuccess == error1);
  return std::unique_ptr<CUstream_st, decltype(deleter)>(raw_stream, deleter);
}
std::atomic<bool> flag{false};

__host__ void poll(void*) {
  while (!flag.load(std::memory_order_acquire))
    ;
}

__host__ void signal(void*) {
  flag.store(true, std::memory_order_release);
}

int main() {
  auto stream0 = make_stream();
  // Callback waits for a global object to change status, which may never change
  // since the signal is in a different callback.
  cudaError_t const error0 = cudaLaunchHostFunc(stream0.get(), &poll, nullptr);
  assert(cudaSuccess == error0);
  auto stream1 = make_stream();
  // call back changes global object status.
  cudaError_t const error1 = cudaLaunchHostFunc(stream1.get(), &signal, nullptr);
  assert(cudaSuccess == error1);
  cudaError_t const error2 = cudaDeviceSynchronize();
  assert(cudaSuccess == error2);
}

Possible Messages

Key	Text	Severity	Disabled
synchronization_primitive_used	{} synchronization primitive used in a CUDA callback	None	False

Options¶

This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions

functions¶

functions

Type: set[bauhaus.analysis.config.QualifiedName]

Default: {'cuGraphAddHostNode', 'cuLaunchHostFunc', 'cuStreamAddCallback', 'cudaGraphAddHostNode', 'cudaLaunchHostFunc', 'cudaStreamAddCallback'}

Functions that enqueue callbacks (on a stream, in a graph, …).

synchronization_types¶

synchronization_types

Type: set[bauhaus.analysis.config.QualifiedName]

Default: {'cnd_t', 'mtx_t', 'std::atomic', 'std::barrier', 'std::condition_variable', 'std::condition_variable_any', 'std::future', 'std::jthread', 'std::latch', 'std::mutex', 'std::recursive_mutex', 'std::recursive_timed_mutex', 'std::shared_future', 'std::thread', 'std::timed_mutex', 'thrd_t'}

Types that are typically used for blocking waits on a condition.

Axivion Suite 7.12.2-public

Navigation