CUDA-1.14¶

A implicit thread block synchronization function should be called by all active threads

Required inputs: IR, StaticSemanticAnalysis

CUDA 1.14 [synchronize.active_threads] A implicit thread block synchronization function should be called by all active threads

An implicit thread block synchronization function is a synchronization function that has predefined behavior which cannot be changed by the programmer. The functions are defined to work on all active threads and must be called by all active threads or no threads; otherwise, the behavior is undefined. The following are implicit thread block synchronization functions:

__syncthreads, __syncthreads_count, __syncthreads_and, and __syncthreads_or.
All variations of this_thread_block().sync().

Scope: Device.

Audience: CUDA C++.

Category: Mandatory.

Hardware Applicability: All Compute Capabilities.

Rationale

The thread block synchronization functions ensure that all active threads have reached a given point in the program. All global and shared memory accesses made by these threads before the synchronization are guaranteed to be visible to all of the threads in the block after the call. Any active thread that skips the synchronization point violates the memory guarantees and may keep the call from returning.

Example 1 (Bad)

__global__ void kernel() {
  for(unsigned i = 0; i < blockDim.x; ++i) {
    if (i == threadIdx.x)
      __syncthreads(); // Non-compliant: synchronization function called with
// a CTA divergent condition.
  }
}

Example 2 (Bad)

__global__ void kernel(int flag) {
  if (flag)
    goto end; // Non-compliant: goto bypasses a synchronization function.
  __syncthreads();
end:
  return;
}

Example 3 (Bad)

__global__ void kernel(int flag) {
  if (flag)
    __syncthreads(); // Ok, condition is not CTA-divergent.
  if (blockIdx.x < flag)
    __syncthreads(); // Ok, condition is not CTA-divergent.
  if (threadIdx.x < 2)
    __syncthreads(); // Non-conformant: condition is CTA-divergent.
  __syncthreads();
  if (threadIdx.x < flag)
    __syncthreads(); // Ok, condition is potentially CTA-divergent but device-
// threads are either participating or terminating since
// this is the last statement of the kernel.
}

Example 4 (Bad)

# include <cooperative_groups.h>

using namespace cooperative_groups;

__device__ void kernel(int flag) {
  thread_block g = this_thread_block();
  if (flag)
    g.sync(); // Ok, condition is not CTA-divergent.
  if (blockIdx.x < flag)
    g.sync(); // Ok, condition is not CTA-divergent.
  if (g.thread_rank() < 2)
    g.sync(); // Nonconfromant: divergent code.
  g.sync();
  if (threadIdx.x < flag)
    g.sync(); // Ok, condition is potentially CTA-divergent but device-
// threads are either participating or terminating since
// this is the last statement of the kernel.
}

Possible Messages

Key	Text	Severity	Disabled
thread_synchronization_in_divergent_region	Implicit thread block synchronization function is called in possibly divergent code.	None	False

Options¶

This rule shares the following common options: exclude_in_macros, exclude_messages_in_system_headers, excludes, extend_exclude_to_macro_invocations, includes, justification_checker, languages, post_processing, provider, report_at, severity
The following places define options that affect this rule: Stylechecks, Analysis-GlobalOptions

synchronization_functions¶

synchronization_functions

Type: set[bauhaus.analysis.config.QualifiedName]

Default: {'__syncthreads', '__syncthreads_and', '__syncthreads_count', '__syncthreads_or', 'cooperative_groups::__v1::thread_block::sync'}

Names of synchronization functions for the purposes of this rule. Qualified names can be used.

Axivion Suite 7.12.2-public

Navigation