CertC++-STR34

Cast characters to unsigned char before converting to larger integer sizes

Required inputs: IR

Signed character data must be converted to unsigned char before being assigned or converted to a larger signed type. This rule applies to both signed char and (plain) char characters on implementations where char is defined to have the same range, representation, and behaviors as signed char.

However, this rule is applicable only in cases where the character data may contain values that can be interpreted as negative numbers. For example, if the char type is represented by a two's complement 8-bit value, any character value greater than +127 is interpreted as a negative value.

This rule is a generalization of STR37-C. Arguments to character-handling functions must be representable as an unsigned char.

Noncompliant Code Example

This noncompliant code example is taken from a vulnerability in bash versions 1.14.6 and earlier that led to the release of CERT Advisory CA-1996-22. This vulnerability resulted from the sign extension of character data referenced by the c_str pointer in the yy_string_get() function in the parse.y module of the bash source code:

static int yy_string_get(void) {
  register char *c_str;
  register int c;

  c_str = bash_input.location.string;
  c = EOF;

  /* If the string doesn't exist or is empty, EOF found */
  if (c_str && *c_str) {
    c = *c_str++;
    bash_input.location.string = c_str;
  }
  return (c);
}

The c_str variable is used to traverse the character string containing the command line to be parsed. As characters are retrieved from this pointer, they are stored in a variable of type int. For implementations in which the char type is defined to have the same range, representation, and behavior as signed char, this value is sign-extended when assigned to the int variable. For character code 255 decimal (-1 in two's complement form), this sign extension results in the value -1 being assigned to the integer, which is indistinguishable from EOF.

Noncompliant Code Example

This problem can be repaired by explicitly declaring the c_str variable as unsigned char:

static int yy_string_get(void) {
  register unsigned char *c_str;
  register int c;

  c_str = bash_input.location.string;
  c = EOF;

  /* If the string doesn't exist or is empty, EOF found */
  if (c_str && *c_str) {
    c = *c_str++;
    bash_input.location.string = c_str;
  }
  return (c);
}

This example, however, violates STR04-C. Use plain char for characters in the basic character set.

Compliant Solution

In this compliant solution, the result of the expression *c_str++ is cast to unsigned char before assignment to the int variable c:

static int yy_string_get(void) {
  register char *c_str;
  register int c;

  c_str = bash_input.location.string;
  c = EOF;

  /* If the string doesn't exist or is empty, EOF found */
  if (c_str && *c_str) {
    /* Cast to unsigned type */
    c = (unsigned char)*c_str++;

    bash_input.location.string = c_str;
  }
  return (c);
}
Noncompliant Code Example

In this noncompliant code example, the cast of *s to unsigned int can result in a value in excess of UCHAR_MAX because of integer promotions, a violation of ARR30-C. Do not form or use out-of-bounds pointers or array subscripts:

#include <limits.h>
#include <stddef.h>
 
static const char table[UCHAR_MAX + 1] = { 'a' /* ... */ };

ptrdiff_t first_not_in_table(const char *c_str) {
  for (const char *s = c_str; *s; ++s) {
    if (table[(unsigned int)*s] != *s) {
      return s - c_str;
    }
  }
  return -1;
}
Compliant Solution

This compliant solution casts the value of type char to unsigned char before the implicit promotion to a larger type:

#include <limits.h>
#include <stddef.h>
 
static const char table[UCHAR_MAX + 1] = { 'a' /* ... */ };

ptrdiff_t first_not_in_table(const char *c_str) {
  for (const char *s = c_str; *s; ++s) {
    if (table[(unsigned char)*s] != *s) {
      return s - c_str;
    }
  }
  return -1;
}
Risk Assessment

Conversion of character data resulting in a value in excess of UCHAR_MAX is an often-missed error that can result in a disturbingly broad range of potentially severe vulnerabilities.

Rule Severity Likelihood Remediation Cost Priority Level
STR34-C Medium Probable Medium P8 L2
Bibliography
[ xorl 2009] CVE-2009-0887: Linux-PAM Signedness Issue
Excerpt from SEI CERT C++ Coding Standard [https://cmu-sei.github.io/secure-coding-standards/sei-cert-c-coding-standard/rules/characters-and-strings-str/str34-c], Copyright (C) 1995-2026 Carnegie Mellon University. See section 9.4. "3rd-Party Licenses" in the documentation for full details.

Possible Messages

Key

Text

Severity

Disabled

cast_from_char_to_larger_type

Cast characters to unsigned char before converting to larger integer sizes

None

False

Options

ignored_typedefs

ignored_typedefs : set[str] = set()

Set of typedefs referring to signed character types which should be ignored by this rule. For instance, this can be used to avoid messages for the fixed-with integer types.
 

only_arguments_of

only_arguments_of : set[str] = set()

Can be used to provide a set of function/macro names; only arguments to them will be considered then.
 

show_operand_in_entity

show_operand_in_entity : bool = False

Whether entity should be "from->to" or "(from->to)operand".
 

type_system

type_system : bauhaus.ir.common.types.type_systems.TypeSystem = <bauhaus.ir.common.types.type_systems.CompilerTypeSystem object at 0x7f6f1c5fd510>

Which type system to use: compiler types, underlying types, essential types.