CollectivesRulesΒΆ

Rules targeting CUDA collectives

Nested Rules

CUDA-2.1

Only include device-threads participating in a warp collective operation in the mask parameter

CUDA-2.2

Use a power of 2 width less than or equal to the warp size with warp collective shuffle operations

CUDA-2.3

All involved threads should be included in the warp collective

CUDA-2.4

All threads must execute the same __syncwarp() in convergence

Options