CUDASecurity-CON01

Prevent data races when accessing bit-fields from multiple threads

Required inputs: IR

When accessing a bit-field, a thread may inadvertently access a separate bit-field in adjacent memory. This is because compilers are required to store multiple adjacent bit-fields in one storage unit whenever they fit. Consequently, data races may exist not just on a bit-field accessed by multiple threads but also on other bit-fields sharing the same byte or word.  A similar problem is discussed in  CON43-C. Do not allow data races in multithreaded code, but the issue described by this rule can be harder to diagnose because it may not be obvious that the same memory location is being modified by multiple threads.

One approach for preventing data races in concurrent programming is to use a mutex. When properly observed by all threads, a mutex can provide safe and secure access to a shared object. However, mutexes provide no guarantees with regard to other objects that might be accessed when the mutex is not controlled by the accessing thread. Unfortunately, there is no portable way to determine which adjacent bit-fields may be stored along with the desired bit-field.

Another approach is to insert a non-bit-field member between any two bit-fields to ensure that each bit-field is the only one accessed within its storage unit. This technique effectively guarantees that no two bit-fields are accessed simultaneously.

Noncompliant Code Example (Bit-field)

Adjacent bit-fields may be stored in a single memory location. Consequently, modifying adjacent bit-fields in different threads is undefined behavior, as shown in this noncompliant code example:

struct multi_threaded_flags {
  unsigned int flag1 : 2;
  unsigned int flag2 : 2;
};

struct multi_threaded_flags flags;

int thread1(void *arg) {
  flags.flag1 = 1;
  return 0;
}

int thread2(void *arg) {
  flags.flag2 = 2;
  return 0;
}

The C Standard, 3.14, paragraph 3 [ ISO/IEC 9899:2011], states

NOTE 2 A bit-field and an adjacent non-bit-field member are in separate memory locations. The same applies to two bit-fields, if one is declared inside a nested structure declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field member declaration. It is not safe to concurrently update two non-atomic bit-fields in the same structure if all members declared between them are also (non-zero-length) bit-fields, no matter what the sizes of those intervening bit-fields happen to be.

For example, the following instruction sequence is possible:

Thread 1: register 0 = flags
Thread 1: register 0 &= ~mask(flag1)
Thread 2: register 0 = flags
Thread 2: register 0 &= ~mask(flag2)
Thread 1: register 0 |= 1 << shift(flag1)
Thread 1: flags = register 0
Thread 2: register 0 |= 2 << shift(flag2)
Thread 2: flags = register 0
Compliant Solution (Bit-field, C11, Mutex)

This compliant solution protects all accesses of the flags with a mutex, thereby preventing any data races:

#include <threads.h>
 
struct multi_threaded_flags {
  unsigned int flag1 : 2;
  unsigned int flag2 : 2;
};

struct mtf_mutex {
  struct multi_threaded_flags s;
  mtx_t mutex;
};

struct mtf_mutex flags;

int thread1(void *arg) {
  if (thrd_success != mtx_lock(&flags.mutex)) {
    /* Handle error */
  }
  flags.s.flag1 = 1;
  if (thrd_success != mtx_unlock(&flags.mutex)) {
    /* Handle error */
  }
  return 0;
}

int thread2(void *arg) {
  if (thrd_success != mtx_lock(&flags.mutex)) {
    /* Handle error */
  }
  flags.s.flag2 = 2;
  if (thrd_success != mtx_unlock(&flags.mutex)) {
    /* Handle error */
  }
  return 0;
}
Compliant Solution (C11)

In this compliant solution, two threads simultaneously modify two distinct non-bit-field members of a structure. Because the members occupy different bytes in memory, no concurrency protection is required.

struct multi_threaded_flags {
  unsigned char flag1;
  unsigned char flag2;
};

struct multi_threaded_flags flags;

int thread1(void *arg) {
  flags.flag1 = 1;
  return 0;
}

int thread2(void *arg) {
  flags.flag2 = 2;
  return 0;
}

Unlike C99, C11 explicitly defines a memory location and provides the following note in subclause 3.14.2 [ ISO/IEC 9899:2011]:

NOTE 1 Two threads of execution can update and access separate memory locations without interfering with each other.

It is almost certain that  flag1 and  flag2 are stored in the same word. Using a compiler that conforms to C99 or earlier, if both assignments occur on a thread-scheduling interleaving that ends with both stores occurring after one another, it is possible that only one of the flags will be set as intended. The other flag will contain its previous value because both members are represented by the same word, which is the smallest unit the processor can work on. Before the changes were made to the C Standard for C11, there were no guarantees that these flags could be modified concurrently.

Risk Assessment

Although the race window is narrow, an assignment or an expression can evaluate improperly because of misinterpreted data resulting in a corrupted running state or unintended information disclosure.

Rule Severity Likelihood Remediation Cost Priority Level
CON32-C Medium Probable Medium P8 L2
Bibliography
[ ISO/IEC 9899:2011] 3.14, "Memory Location"
Excerpt from NVIDIA CUDA C++ Guidelines for robust and safety-critical programming, Version 3.0.1, Copyright (C) 2018-2023 NVIDIA Corporation.

Possible Messages

Key

Text

Severity

Disabled

data-race

Prevent data races when accessing bit-fields from multiple threads.

None

False

threads-write

Prevent data races when writing bit-fields from multiple threads.

None

False

write-race

Prevent data races when writing bit-fields from multiple threads.

None

False

Options

enter_critical_functions

enter_critical_functions : set[bauhaus.analysis.config.QualifiedName] = {'std::lock_guard::lock_guard', 'std::mutex::lock'}

Set of function names to enter a critical region.
 

enter_critical_macros

enter_critical_macros : set[bauhaus.analysis.config.MacroName] = set()

Set of macro names to enter a critical region (macros must expand to asm() statement).
 

exit_critical_functions

exit_critical_functions : set[bauhaus.analysis.config.QualifiedName] = {'std::lock_guard::~lock_guard', 'std::mutex::unlock'}

Set of function names to exit a critical region.
 

exit_critical_macros

exit_critical_macros : set[bauhaus.analysis.config.MacroName] = set()

Set of macro names to exit a critical region (macros must expand to asm() statement).
 

inspect_pointers

inspect_pointers : bool = False

Whether pointer targets should be inspected to detect more global variable uses.
 

nested_critical_regions

nested_critical_regions : bool = True

If set to true, critical regions nest; if set to false, a single exit-critical-region terminates all open critical regions.
 

report_read_races

report_read_races : bool = False

Whether potentially conflicting read accesses (R/R) should be reported, too.
 

synchronizing_routines

synchronizing_routines

Type: set[bauhaus.analysis.config.QualifiedName]

Default: {'__syncthreads', 'cooperative_groups::__v1::thread_block::sync', 'cuda::barrier::arrive_and_wait', 'cuda::barrier::wait'}

Calls to these routines will be considered synchronization points. Usually, global memory is written before such a point and only safely read afterwards.