CUDADirective-1.5¶
Callbacks should not wait for other callbacks
Required inputs: IR
CUDA DIRECTIVE 1.5 [synchronize.callback] Callbacks should not wait for other callbacks
In callbacks which happen in stream order, e.g. those created via cudaStreamAddCallback,
cuStreamAddCallback, cudaLaunchHostFunc, cuLaunchHostFunc, cudaGraphAddHostNode, or
cuGraphAddHostNode, do not wait for a condition that will be satisfied by another callback.
Scope: Host.
Audience: CUDA C++, CUDA Libraries.
Hardware Applicability: All Compute Capabilities.
Rationale
Callbacks registered with CUDA which happen in stream order are executed in unspecified threads of execu- tion which are owned by the CUDA runtime and driver. There execution may be serialized, even if there is not a mandated order between them. Therefore, a callback blocking on another callback may never complete, because the callback it is waiting for may not start executing until the blocking callback completes.
Example 1 (Bad)
# include <cassert> # include <atomic> # include <memory> # include <cuda_runtime.h> auto make_stream() { auto deleter = [] (CUstream_st* ptr) { cudaError_t const error0 = cudaStreamDestroy(ptr); assert(cudaSuccess == error0); }; CUstream_st* raw_stream = nullptr; cudaError_t const error1 = cudaStreamCreate(&raw_stream); assert(cudaSuccess == error1); return std::unique_ptr<CUstream_st, decltype(deleter)>(raw_stream, deleter); } std::atomic<bool> flag{false}; __host__ void poll(void*) { while (!flag.load(std::memory_order_acquire)) ; } __host__ void signal(void*) { flag.store(true, std::memory_order_release); } int main() { auto stream0 = make_stream(); // Callback waits for a global object to change status, which may never change // since the signal is in a different callback. cudaError_t const error0 = cudaLaunchHostFunc(stream0.get(), &poll, nullptr); assert(cudaSuccess == error0); auto stream1 = make_stream(); // call back changes global object status. cudaError_t const error1 = cudaLaunchHostFunc(stream1.get(), &signal, nullptr); assert(cudaSuccess == error1); cudaError_t const error2 = cudaDeviceSynchronize(); assert(cudaSuccess == error2); }Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.
Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
synchronization_primitive_used |
{} synchronization primitive used in a CUDA callback |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
functions¶
functions
Functions that enqueue callbacks (on a stream, in a graph, …).Type: set[bauhaus.analysis.config.QualifiedName]
Default:
{'cuGraphAddHostNode', 'cuLaunchHostFunc', 'cuStreamAddCallback', 'cudaGraphAddHostNode', 'cudaLaunchHostFunc', 'cudaStreamAddCallback'}
synchronization_types¶
synchronization_types
Types that are typically used for blocking waits on a condition.Type: set[bauhaus.analysis.config.QualifiedName]
Default:
{'cnd_t', 'mtx_t', 'std::atomic', 'std::barrier', 'std::condition_variable', 'std::condition_variable_any', 'std::future', 'std::jthread', 'std::latch', 'std::mutex', 'std::recursive_mutex', 'std::recursive_timed_mutex', 'std::shared_future', 'std::thread', 'std::timed_mutex', 'thrd_t'}