CertC-ARR38

Guarantee that library functions do not form invalid pointers

Required inputs: IR, StaticSemanticAnalysis

C library functions that make changes to arrays or objects take at least two arguments: a pointer to the array or object and an integer indicating the number of elements or bytes to be manipulated. For the purposes of this rule, the element count of a pointer is the size of the object to which it points, expressed by the number of elements that are valid to access. Supplying arguments to such a function might cause the function to form a pointer that does not point into or just past the end of the object, resulting in undefined behavior.

Annex J of the C Standard [ ISO/IEC 9899:2011] states that it is undefined behavior if the "pointer passed to a library function array parameter does not have a value such that all address computations and object accesses are valid." (See undefined behavior 109.)

In the following code,

int arr[5];
int *p = arr;

unsigned char *p2 = (unsigned char *)arr;
unsigned char *p3 = arr + 2;
void *p4 = arr;

the element count of the pointer p is sizeof(arr) / sizeof(arr[0]), that is, 5. The element count of the pointer p2 is sizeof(arr), that is,  20, on implementations where sizeof(int) == 4. The element count of the pointer p3 is  12 on implementations where sizeof(int) == 4, because p3 points two elements past the start of the array arr.  The element count of p4 is treated as though it were unsigned char * instead of void *, so it is the same as p2.

Pointer + Integer

The following standard library functions take a pointer argument and a size argument, with the constraint that the pointer must point to a valid memory object of at least the number of elements indicated by the size argument.

fgets() fgetws() mbstowcs()1   wcstombs()1
mbrtoc16()2   mbrtoc32()2 mbsrtowcs()1 wcsrtombs()1
mbtowc()2   mbrtowc()2 mblen() mbrlen()
memchr() wmemchr() memset() wmemset()
strftime() wcsftime() strxfrm()1 wcsxfrm()1
strncat()2  wcsncat()2 snprintf() vsnprintf()
swprintf() vswprintf() setvbuf() tmpnam_s()
snprintf_s() sprintf_s()  vsnprintf_s() vsprintf_s()
gets_s()  getenv_s() wctomb_s() mbstowcs_s()3
wcstombs_s()3 memcpy_s()3 memmove_s()3 strncpy_s()3
strncat_s()3 strtok_s()2 strerror_s() strnlen_s()
asctime_s() ctime_s() snwprintf_s() swprintf_s()
vsnwprintf_s() vswprintf_s() wcsncpy_s()3 wmemcpy_s()3
wmemmove_s()3 wcsncat_s()3 wcstok_s()2 wcsnlen_s()
wcrtomb_s() mbsrtowcs_s()3 wcsrtombs_s()3 memset_s()4

1 Takes two pointers and an integer, but the integer specifies the element count only of the output buffer, not of the input buffer.
2 Takes two pointers and an integer, but the integer specifies the element count only of the input buffer, not of the output buffer.
3 Takes two pointers and two integers; each integer corresponds to the element count of one of the pointers.
4 Takes a pointer and two size-related integers; the first size-related integer parameter specifies the number of bytes available in the buffer; the second size-related integer parameter specifies the number of bytes to write within the buffer.

For calls that take a pointer and an integer size, the given size should not be greater than the element count of the pointer.

 Noncompliant Code Example (Element Count)

In this noncompliant code example, the incorrect element count is used in a call to wmemcpy(). The sizeof operator returns the size expressed in bytes, but wmemcpy() uses an element count based on wchar_t *.

#include <string.h>
#include <wchar.h>
 
static const char str[] = "Hello world";
static const wchar_t w_str[] = L"Hello world";
void func(void) {
  char buffer[32];
  wchar_t w_buffer[32];
  memcpy(buffer, str, sizeof(str)); /* Compliant */
  wmemcpy(w_buffer, w_str, sizeof(w_str)); /* Noncompliant */
}
Compliant Solution (Element Count)

When using functions that operate on pointed-to regions, programmers must always express the integer size in terms of the element count expected by the function. For example, memcpy() expects the element count expressed in terms of void *, but wmemcpy() expects the element count expressed in terms of wchar_t *.  Instead of the sizeof operator, functions that return the number of elements in the string are called, which matches the expected element count for the copy functions. In the case of this compliant solution, where the argument is an array A of type T, the expression sizeof(A) / sizeof(T), or equivalently sizeof(A) / sizeof(*A), can be used to compute the number of elements in the array.

#include <string.h>
#include <wchar.h>
 
static const char str[] = "Hello world";
static const wchar_t w_str[] = L"Hello world";
void func(void) {
  char buffer[32];
  wchar_t w_buffer[32];
  memcpy(buffer, str, strlen(str) + 1);
  wmemcpy(w_buffer, w_str, wcslen(w_str) + 1);
} 
Noncompliant Code Example (Pointer + Integer)

This noncompliant code example assigns a value greater than the number of bytes of available memory to n, which is then passed to memset():

#include <stdlib.h>
#include <string.h>
 
void f1(size_t nchars) {
  char *p = (char *)malloc(nchars);
  /* ... */
  const size_t n = nchars + 1;
  /* ... */
  memset(p, 0, n);
}
Compliant Solution (Pointer + Integer)

This compliant solution ensures that the value of n is not greater than the number of bytes of the dynamic memory pointed to by the pointer p:

#include <stdlib.h>
#include <string.h>
 
void f1(size_t nchars) {
  char *p = (char *)malloc(nchars);
  /* ...  */
  const size_t n = nchars;
  /* ...  */
  memset(p, 0, n);
}
Noncompliant Code Example (Pointer + Integer)

In this noncompliant code example, the element count of the array a is ARR_SIZE elements. Because memset() expects a byte count, the size of the array is scaled incorrectly by sizeof(int) instead of sizeof(long), which can form an invalid pointer on architectures where sizeof(int) != sizeof(long).

#include <string.h>
 
void f2(void) {
  const size_t ARR_SIZE = 4;
  long a[ARR_SIZE];
  const size_t n = sizeof(int) * ARR_SIZE;
  void *p = a;

  memset(p, 0, n);
}
Compliant Solution (Pointer + Integer)

In this compliant solution, the element count required by memset() is properly calculated without resorting to scaling:

#include <string.h>
 
void f2(void) {
  const size_t ARR_SIZE = 4;
  long a[ARR_SIZE];
  const size_t n = sizeof(a);
  void *p = a;

  memset(p, 0, n);
}
Two Pointers + One Integer

The following standard library functions take two pointer arguments and a size argument, with the constraint that both pointers must point to valid memory objects of at least the number of elements indicated by the size argument. 

memcpy() wmemcpy() memmove() wmemmove()
strncpy() wcsncpy() memcmp() wmemcmp()
strncmp() wcsncmp() strcpy_s() wcscpy_s()
strcat_s() wcscat_s()

For calls that take two pointers and an integer size, the given size should not be greater than the element count of either pointer.

Noncompliant Code Example (Two Pointers + One Integer)

In this noncompliant code example, the value of n is incorrectly computed, allowing a read past the end of the object referenced by q:

#include <string.h>

void f4() {
  char p[40];
  const char *q = "Too short";
  size_t n = sizeof(p);
  memcpy(p, q, n);
}
Compliant Solution (Two Pointers + One Integer)

This compliant solution ensures that n is equal to the size of the character array:

#include <string.h>

void f4() {
  char p[40];
  const char *q = "Too short";
  size_t n = sizeof(p) < strlen(q) + 1 ? sizeof(p) : strlen(q) + 1;
  memcpy(p, q, n);
}
One Pointer + Two Integers

The following standard library functions take a pointer argument and two size arguments, with the constraint that the pointer must point to a valid memory object containing at least as many bytes as the product of the two size arguments.

bsearch() bsearch_s() qsort() qsort_s()
fread() fwrite()  

For calls that take a pointer and two integers, one integer represents the number of bytes required for an individual object, and a second integer represents the number of elements in the array. The resulting product of the two integers should not be greater than the element count of the pointer were it expressed as an unsigned char *.  

Noncompliant Code Example (One Pointer + Two Integers)

This noncompliant code example allocates a variable number of objects of type struct obj. The function checks that num_objs is small enough to prevent wrapping, in compliance with  INT30-C. Ensure that unsigned integer operations do not wrap. The size of  struct obj is assumed to be 16 bytes to account for padding to achieve the assumed alignment of long long. However, the padding typically depends on the target architecture, so this object size may be incorrect, resulting in an incorrect element count.

#include <stdint.h>
#include <stdio.h>
 
struct obj {
  char c;
  long long i;
};
 
void func(FILE *f, struct obj *objs, size_t num_objs) {
  const size_t obj_size = 16;
  if (num_objs > (SIZE_MAX / obj_size) ||
      num_objs != fwrite(objs, obj_size, num_objs, f)) {
    /* Handle error */
  }
}
Compliant Solution (One Pointer + Two Integers)

This compliant solution uses the sizeof operator to correctly provide the object size and num_objs to provide the element count:

#include <stdint.h>
#include <stdio.h>

struct obj {
  char c;
  long long i;
};

void func(FILE *f, struct obj *objs, size_t num_objs) {
  const size_t obj_size = sizeof *objs;
  if (num_objs > (SIZE_MAX / obj_size) ||
      num_objs != fwrite(objs, obj_size, num_objs, f)) {
    /* Handle error */
  }
}
Noncompliant Code Example (One Pointer + Two Integers)

In this noncompliant code example, the function  f() calls  fread() to read  nitems of type  wchar_t, each  size bytes in size, into an array of  BUFFER_SIZE elements,  wbuf. However, the expression used to compute the value of  nitems fails to account for the fact that, unlike the size of  char, the size of  wchar_t may be greater than 1. Consequently,  fread() could attempt to form pointers past the end of  wbuf and use them to assign values to nonexistent elements of the array. Such an attempt is undefined behavior. (See undefined behavior 109.)  A likely consequence of this undefined behavior is a buffer overflow. For a discussion of this programming error in the Common Weakness Enumeration database, see  CWE-121, "Stack-based Buffer Overflow," and  CWE-805, "Buffer Access with Incorrect Length Value."

#include <stddef.h>
#include <stdio.h>

void f(FILE *file) {
  enum { BUFFER_SIZE = 1024 };
  wchar_t wbuf[BUFFER_SIZE];

  const size_t size = sizeof(*wbuf);
  const size_t nitems = sizeof(wbuf);

  size_t nread = fread(wbuf, size, nitems, file);
  /* ... */
}
Compliant Solution (One Pointer + Two Integers)

This compliant solution correctly computes the maximum number of items for  fread() to read from the file:

#include <stddef.h>
#include <stdio.h>
 
void f(FILE *file) {
  enum { BUFFER_SIZE = 1024 };
  wchar_t wbuf[BUFFER_SIZE];

  const size_t size = sizeof(*wbuf);
  const size_t nitems = sizeof(wbuf) / size;

  size_t nread = fread(wbuf, size, nitems, file);
  /* ... */
}
Noncompliant Code Example (Heartbleed)

CERT vulnerability  720951 describes a vulnerability in OpenSSL versions 1.0.1 through 1.0.1f, popularly known as "Heartbleed." This vulnerability allows an attacker to steal information that under normal conditions would be protected by Secure Socket Layer/Transport Layer Security (SSL/TLS) encryption.

Despite the seriousness of the vulnerability, Heartbleed is the result of a common programming error and an apparent lack of awareness of secure coding principles. Following is the vulnerable code:

int dtls1_process_heartbeat(SSL *s) {
  unsigned char *p = &s->s3->rrec.data[0], *pl;
  unsigned short hbtype;
  unsigned int payload;
  unsigned int padding = 16; /* Use minimum padding */

  /* Read type and payload length first */
  hbtype = *p++;
  n2s(p, payload);
  pl = p;

  /* ... More code ... */

  if (hbtype == TLS1_HB_REQUEST) {
    unsigned char *buffer, *bp;
    int r;

    /*
     * Allocate memory for the response; size is 1 byte
     * message type, plus 2 bytes payload length, plus
     * payload, plus padding.
     */
    buffer = OPENSSL_malloc(1 + 2 + payload + padding);
    bp = buffer;

    /* Enter response type, length, and copy payload */
    *bp++ = TLS1_HB_RESPONSE;
    s2n(payload, bp);
    memcpy(bp, pl, payload);

    /* ... More code ... */
  }
  /* ... More code ... */
}


This code processes a "heartbeat" packet from a client. As specified in  RFC 6520, when the program receives a heartbeat packet, it must echo the packet's data back to the client. In addition to the data, the packet contains a length field that conventionally indicates the number of bytes in the packet data, but there is nothing to prevent a malicious packet from lying about its data length.

The p pointer, along with payload and p1, contains data from a packet. The code allocates a buffer sufficient to contain payload bytes, with some overhead, then copies payload bytes starting at p1 into this buffer and sends it to the client. Notably absent from this code are any checks that the payload integer variable extracted from the heartbeat packet corresponds to the size of the packet data. Because the client can specify an arbitrary value of payload, an attacker can cause the server to read and return the contents of memory beyond the end of the packet data, which violates INT04-C. Enforce limits on integer values originating from tainted sources. The resulting call to memcpy() can then copy the contents of memory past the end of the packet data and the packet itself, potentially exposing sensitive data to the attacker. This call to memcpy() violates ARR38-C. Guarantee that library functions do not form invalid pointers. A version of ARR38-C also appears in ISO/IEC TS 17961:2013, "Forming invalid pointers by library functions [libptr]." This rule would require a conforming analyzer to diagnose the Heartbleed vulnerability.


Compliant Solution (Heartbleed)

OpenSSL version 1.0.1g contains the following patch, which guarantees that  payload is within a valid range. The range is limited by the size of the input record.

int dtls1_process_heartbeat(SSL *s) {
  unsigned char *p = &s->s3->rrec.data[0], *pl;
  unsigned short hbtype;
  unsigned int payload;
  unsigned int padding = 16; /* Use minimum padding */

  /* ... More code ... */

  /* Read type and payload length first */
  if (1 + 2 + 16 > s->s3->rrec.length)
    return 0; /* Silently discard */
  hbtype = *p++;
  n2s(p, payload);
  if (1 + 2 + payload + 16 > s->s3->rrec.length)
    return 0; /* Silently discard per RFC 6520 */
  pl = p;

  /* ... More code ... */

  if (hbtype == TLS1_HB_REQUEST) {
    unsigned char *buffer, *bp;
    int r;

    /*
     * Allocate memory for the response; size is 1 byte
     * message type, plus 2 bytes payload length, plus
     * payload, plus padding.
     */
    buffer = OPENSSL_malloc(1 + 2 + payload + padding);
    bp = buffer;
    /* Enter response type, length, and copy payload */
    *bp++ = TLS1_HB_RESPONSE;
    s2n(payload, bp);
    memcpy(bp, pl, payload);
    /* ... More code ... */
  }
  /* ... More code ... */
}
Risk Assessment

Depending on the library function called, an attacker may be able to use a heap or stack overflow vulnerability to run arbitrary code.

Rule Severity Likelihood Remediation Cost Priority Level
ARR38-C High Likely Medium P18 L1
Related Guidelines
Taxonomy Taxonomy item Relationship
C Secure Coding Standard API00-C. Functions should validate their parameters Prior to 2018-01-12: CERT: Unspecified Relationship
C Secure Coding Standard ARR01-C. Do not apply the sizeof operator to a pointer when taking the size of an array Prior to 2018-01-12: CERT: Unspecified Relationship
C Secure Coding Standard INT30-C. Ensure that unsigned integer operations do not wrap Prior to 2018-01-12: CERT: Unspecified Relationship
ISO/IEC TS 17961:2013 Forming invalid pointers by library functions [libptr] Prior to 2018-01-12: CERT: Unspecified Relationship
ISO/IEC TR 24772:2013 Buffer Boundary Violation (Buffer Overflow) [HCB] Prior to 2018-01-12: CERT: Unspecified Relationship
ISO/IEC TR 24772:2013 Unchecked Array Copying [XYW] Prior to 2018-01-12: CERT: Unspecified Relationship
CWE 2.11 CWE-119 ,Improper Restriction of Operations within the Bounds of a Memory Buffer 2017-05-18: CERT: Rule subset of CWE
CWE 2.11 CWE-121, Stack-based Buffer Overflow 2017-05-18: CERT: Partial overlap
CWE 2.11 CWE-123, Write-what-where Condition 2017-05-18: CERT: Partial overlap
CWE 2.11 CWE-125, Out-of-bounds Read 2017-05-18: CERT: Partial overlap
CWE 2.11 CWE-805, Buffer Access with Incorrect Length Value 2017-05-18: CERT: Partial overlap
CWE 3.1 CWE-129, Improper Validation of Array Index

2017-10-30:MITRE:Unspecified Relationship

2018-10-18:CERT:Partial Overlap

Bibliography
[ Cassidy 2014] Existential Type Crisis : Diagnosis of the OpenSSL Heartbleed Bug
[ IETF: RFC 6520]
[ ISO/IEC TS 17961:2013]
[ VU#720951]
Excerpt from SEI CERT C Coding Standard: Rules for Developing Safe, Reliable, and Secure Systems (2016 Edition) and SEI CERT C Coding Standard [https://cmu-sei.github.io/secure-coding-standards/sei-cert-c-coding-standard/rules/arrays-arr/arr38-c], Copyright (C) 1995-2026 Carnegie Mellon University. See section 9.4. "3rd-Party Licenses" in the documentation for full details.

Possible Messages

Key

Text

Severity

Disabled

size_too_large

Guarantee that library function “{name}” does not form an invalid pointer of {target_size} and {descr1}{size} {via}{pos}. argument.

None

False

unknown_element_size

Guarantee that library function “{name}” does not form an invalid pointer due to an unknown element size.

None

False

Options

element_size_argument

element_size_argument

Type: dict[bauhaus.analysis.config.QualifiedName, int]

Default:

{
   'bsearch': 3,
   'bsearch_s': 3,
   'fread': 1,
   'fwrite': 1,
   'qsort': 2,
   'qsort_s': 2
}
Names were the argument denotes the element size to be multiplied with another size argument.
 

enable_buffer_analysis

enable_buffer_analysis : bool = True

If true, try to compute sizes and offsets using the possibly expensive buffer analysis. Also unreachable code is then detected and not reported. If false, offsets will not be considered and might cause false negatives.
 

exclude_sizeof_syntactically

exclude_sizeof_syntactically : bool = True

Check if the size argument is given by the sizeof operator and the pointer argument to be checked. This option is similar to length_names only that sizeof is a keyword and no standard call.
 

exclude_warnings_for_unknown_arguments

exclude_warnings_for_unknown_arguments : bool = False

Exclude warnings for cases where nothing at all is known about the pointer arguments of an operation, caused e.g. by using return values of external routines.
 

functions

functions

Type: set[bauhaus.analysis.config.QualifiedName]

Default: {'asctime_s', 'bsearch', 'bsearch_s', 'ctime_s', 'fgets', 'fgetws', 'fread', 'fwrite', 'getenv_s', 'gets_s', 'mblen', 'mbrlen', 'mbrtoc16', 'mbrtoc32', 'mbrtowc', 'mbsrtowcs', 'mbsrtowcs_s', 'mbstowcs', 'mbstowcs_s', 'mbtowc', 'memchr', 'memcmp', 'memcpy', 'memcpy_s', 'memmove', 'memmove_s', 'memset', 'memset_s', 'qsort', 'qsort_s', 'setvbuf', 'snprintf', 'snprintf_s', 'snwprintf_s', 'sprintf_s', 'strcat_s', 'strcpy_s', 'strerror_s', 'strftime', 'strncat', 'strncat_s', 'strncmp', 'strncpy', 'strncpy_s', 'strnlen_s', 'strxfrm', 'swprintf', 'swprintf_s', 'tmpnam_s', 'vsnprintf', 'vsnprintf_s', 'vsnwprintf_s', 'vsprintf_s', 'vswprintf', 'vswprintf_s', 'wcrtomb_s', 'wcscat_s', 'wcscpy_s', 'wcsftime', 'wcsncat', 'wcsncat_s', 'wcsncmp', 'wcsncpy', 'wcsncpy_s', 'wcsnlen_s', 'wcsrtombs', 'wcsrtombs_s', 'wcstombs', 'wcstombs_s', 'wcsxfrm', 'wctomb_s', 'wmemchr', 'wmemcmp', 'wmemcpy', 'wmemcpy_s', 'wmemmove', 'wmemmove_s', 'wmemset'}

Names of functions being relevant as call targets for this check in addition to those of function_lookup in header files.
 

ignore_arguments

ignore_arguments

Type: dict[bauhaus.analysis.config.QualifiedName, set[int]]

Default:

{
   'bsearch': {0},
   'bsearch_s': {0},
   'getenv_s': {0},
   'mbrtoc16': {0},
   'mbrtoc32': {0},
   'mbrtowc': {0},
   'mbsrtowcs_s': {0, 3},
   'mbstowcs_s': {0},
   'mbtowc': {0},
   'setvbuf': {0},
   'wcrtomb_s': {0},
   'wcsrtombs_s': {0, 3},
   'wcstombs_s': {0},
   'wctomb_s': {0}
}
The analysis typically infers the relevant arguments by itself, but in some cases an argument should be disregarded for the inference of pointer and size arguments.
 

ignore_calls_in_functions

ignore_calls_in_functions : set[bauhaus.analysis.config.QualifiedName] = set()

Qualified names of function definitions in which calls to relevant functions are ignored for this check.
 

length_names

length_names : set[bauhaus.analysis.config.QualifiedName] = {'strlen', 'strnlen', 'wcslen', 'wcsnlen'}

Names of length functions. Pointer arguments that are also the first argument of a length function as a size argument are syntactically not checked further.
 

pointer_argument

pointer_argument

Type: dict[bauhaus.analysis.config.QualifiedName, int]

Default:

{
   'mbsrtowcs': 0,
   'strxfrm': 0,
   'wcsrtombs': 0,
   'wcsrtombs_s': 1,
   'wcsxfrm': 0
}
The analysis typically infers the relevant arguments by itself, but in some cases (e.g. functions having more than one pointer parameter) it may be necessary to provide the index of the single pointer argument (starting with index 0).
 

second_size_argument

second_size_argument

Type: dict[bauhaus.analysis.config.QualifiedName, int]

Default:

{
   'mbsrtowcs_s': 4,
   'mbstowcs_s': 4,
   'memcpy_s': 3,
   'memmove_s': 3,
   'memset_s': 3,
   'strncat_s': 3,
   'strncpy_s': 3,
   'wcsncat_s': 3,
   'wcsncpy_s': 3,
   'wcsrtombs_s': 4,
   'wcstombs_s': 4,
   'wmemcpy_s': 3,
   'wmemmove_s': 3
}
Names were a second size argument also applies to the first pointer.
 

size_argument

size_argument

Type: dict[bauhaus.analysis.config.QualifiedName, int]

Default:

{
   'memchr': 2,
   'memset': 2,
   'setvbuf': 3,
   'wcrtomb_s': 2,
   'wmemchr': 2,
   'wmemset': 2
}
The analysis typically infers the relevant arguments by itself, but in some cases (e.g. functions having more than one integer parameter) it may be necessary to provide the index of the single size argument (starting with index 0).
 

use_type_based_maximum

use_type_based_maximum : bool = True

If true, assume a maximal value if only the type of a size value is known. This will result in a size_too_large violation, if the analysis does not know better and the code is reachable according to the buffer analysis.
 

use_type_based_minimum

use_type_based_minimum : bool = False

If true, assume zero if only the type of a size value is known.