CUDA-1.14¶
A implicit thread block synchronization function should be called by all active threads
Required inputs: IR, StaticSemanticAnalysis
CUDA 1.14 [synchronize.active_threads] A implicit thread block synchronization function should be called by all active threads
An implicit thread block synchronization function is a synchronization function that has predefined behavior which cannot be changed by the programmer. The functions are defined to work on all active threads and must be called by all active threads or no threads; otherwise, the behavior is undefined. The following are implicit thread block synchronization functions:
__syncthreads,__syncthreads_count,__syncthreads_and, and__syncthreads_or.- All variations of
this_thread_block().sync().
Scope: Device.
Audience: CUDA C++.
Category: Mandatory.
Hardware Applicability: All Compute Capabilities.
Rationale
The thread block synchronization functions ensure that all active threads have reached a given point in the program. All global and shared memory accesses made by these threads before the synchronization are guaranteed to be visible to all of the threads in the block after the call. Any active thread that skips the synchronization point violates the memory guarantees and may keep the call from returning.
Example 1 (Bad)
__global__ void kernel() { for(unsigned i = 0; i < blockDim.x; ++i) { if (i == threadIdx.x) __syncthreads(); // Non-compliant: synchronization function called with // a CTA divergent condition. } }
Example 2 (Bad)
__global__ void kernel(int flag) { if (flag) goto end; // Non-compliant: goto bypasses a synchronization function. __syncthreads(); end: return; }
Example 3 (Bad)
__global__ void kernel(int flag) { if (flag) __syncthreads(); // Ok, condition is not CTA-divergent. if (blockIdx.x < flag) __syncthreads(); // Ok, condition is not CTA-divergent. if (threadIdx.x < 2) __syncthreads(); // Non-conformant: condition is CTA-divergent. __syncthreads(); if (threadIdx.x < flag) __syncthreads(); // Ok, condition is potentially CTA-divergent but device- // threads are either participating or terminating since // this is the last statement of the kernel. }
Example 4 (Bad)
# include <cooperative_groups.h> using namespace cooperative_groups; __device__ void kernel(int flag) { thread_block g = this_thread_block(); if (flag) g.sync(); // Ok, condition is not CTA-divergent. if (blockIdx.x < flag) g.sync(); // Ok, condition is not CTA-divergent. if (g.thread_rank() < 2) g.sync(); // Nonconfromant: divergent code. g.sync(); if (threadIdx.x < flag) g.sync(); // Ok, condition is potentially CTA-divergent but device- // threads are either participating or terminating since // this is the last statement of the kernel. }Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.
Possible Messages
Key |
Text |
Severity |
Disabled |
|---|---|---|---|
thread_synchronization_in_divergent_region |
Implicit thread block synchronization function is called in possibly divergent code. |
None |
False |
Options¶
This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions
synchronization_functions¶
synchronization_functions
Names of synchronization functions for the purposes of this rule. Qualified names can be used.Type: set[bauhaus.analysis.config.QualifiedName]
Default:
{'__syncthreads', '__syncthreads_and', '__syncthreads_count', '__syncthreads_or', 'cooperative_groups::__v1::thread_block::sync'}