CUDA-1.2¶
Synchronize with all devices before termination
Required inputs: IR
CUDA 1.2 [synchronize.termination] Synchronize with all devices before termination
Before the start of normal host program termination, synchronize with all outstanding work on all devices.
Scope: Host.
Audience: CUDA C++, CUDA Libraries.
Category: Advisory.
Hardware Applicability: All Compute Capabilities.
Rationale
Many CUDA interfaces are asynchronous and are evaluated concurrently with the host program on devices.
The CUDA runtime and driver do not implicitly synchronize with any outstanding work on any devices
during host program termination.
If a CUDA C++ program fails to synchronize with all outstanding work on all devices before normal host
program termination (returning from main, calling std::abort, etc.), no diagnostic will be produced for
any asynchronous errors that occur during the execution of that work.
Example 1 (Bad)
# include <cassert> __global__ void fail() { *(int*)0 = 0; } int main() { fail<<<1, 1>>>(); // This call succeeds, because the kernel launch did not have a synchronous // error. cudaError_t const error0 = cudaGetLastError(); assert(cudaSuccess == error0); // The assert in the kernel leads to an asynchronous error which is silently // ignored. }
Example 2 (Good)
# include <cassert> # include <cuda.h> # include <stdio.h> __global__ void fail(int n, float m, float *y) { y[n] = m; } int main() { fail<<<1, 1>>>(1000, 0, NULL); // This call succeeds, because the kernel launch did not have a synchronous // error. cudaError_t const error1 = cudaGetLastError(); if(error1 != cudaSuccess) { printf("%s", "fail<<<1, 1>>>(1000, 0, d_y);"); } // This call returns the sticky error from the kernel launch. cudaError_t const error2 = cudaDeviceSynchronize(); assert(error2 == cudaSuccess); }
Example 3 (Bad)
# include <cassert> # include "testTerminate.h" __global__ void fail(int* i) { *i = 17; } int main() { fail<<<1, 1>>>(nullptr); // null pointer results in a segmentation fault. cudaError_t const error0 = cudaGetLastError(); testTerminate(error0); // The assert in the kernel leads to an asynchronous error which is silently // ignored. }
Example 4 (Good)
# include <cassert> # include "testTerminate.h" __global__ void fail(int* i) { *i = 17; } int main() { fail<<<1, 1>>>(nullptr); // This call succeeds, because the kernel launch did not have a synchronous // error. cudaError_t const error0 = cudaGetLastError(); testTerminate(error0); cudaError_t const error1 = cudaDeviceSynchronize(); assert(cudaSuccess == error1); }Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.
Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
cuda_synchronize_termination |
Synchronize with all devices before termination |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
terminate_functions¶
terminate_functions
A table providing (globbing patterns for) header file names and functions declared therein that should be treated as terminating the program for the purposes of this rule.Type: dict[bauhaus.analysis.config.FileGlobPattern, list[bauhaus.analysis.config.GlobPattern]]
Default:
{ '*abort.h': ['abort'], '*stdlib.h': ['exit', 'abort', 'quick_exit', '_Exit'], '*terminate.h': ['terminate'], 'exception': ['terminate'] }