Checked C Extends LLVM to Bring Spatial Memory Safety to C

Checked C is an open, collaborative project led by Microsoft Research aimed to extend the C language so programmers can write more reliable programs free of errors such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts. Checked C code can coexist with code written in standard C to ease porting.

In an article soon to appear in IEEE Cybersecurity Development Conference 2018, the researchers behind the language present Checked C main features and strengths.

Checked C borrows many ideas from prior safe-C efforts but ultimately differs in that its design focuses on allowing incremental conversion while balancing control, interoperability, and high performance.

The language is focused on backward-compatibility and employs the concept of checked pointers to ensure static and dynamic access verification. In particular, Checked C aims to ensure spatial safety of memory access, meaning that pointers are always dereferenced within the memory they were allocated. The three main design keys of Checked C are:

C pointer representation is preserved, so Checked C memory layout is the same as C’s. This provides easy interoperability with existing C codebases and libraries.
Explicitly specifying the boundaries of a chunk of memory allocated to a pointer minimizes compile- and run-time overhead.
Checked regions and bounds-safe interfaces enables mixing Checked and legacy C code. Fully ported code resides in a sp called checked region, where each pointer access is granted free of spatial memory violations. Checked code can access unsafe, legacy C code through bounds-safe annotations, which are annotations added to unchecked code parameters, return values, function and record types, and global variables.

Checked C adds two new pointer types to the C language, _Ptr<T> and _Array_ptr<T>. The former is exclusively used for dereference, while the latter supports pointer arithmetic. For both pointers, the compiler dynamically guarantees their validity before dereferencing them. This is how you define a function taking two _Array_ptr<T> arguments:

void append(
  _Array_ptr<char> dst : count(dst_count),
  _Array_ptr<char> src : count(src_count),
  size_t dst_count, size_t src_count)
{
  _Dynamic_check(src_count <= dst_count);
  for (size_t i = 0; i < src_count; i++) {
    if (src[i] == ’\0’) {
      break;
     }
    dst[i] = src[i];
  }
}

When creating arrays, programmers can specify their checked bounds using the _Checked keyword:

int buf _Checked[10]

Interestingly, Checked C has partial formalization that allows to prove that any violation of spatial safety comes from outside a checked region. Further work in this direction will aim to extend this proof to dynamically sized arrays as well.

According to Checked C creators, converting existing C code to use checked pointers and arrays requires a relative amount of code changes, affecting about 17.5% of lines. To improve this, the team is at work on tools to make the conversion automatic. This seems to be a key goal, when it comes to fostering Checked C adoption and more importantly the conversion of existing codebases. Indeed, Checked C is not the first project aiming to add some form of memory safety to the C language. GCC contributor William Bader mentioned GCC "fat pointers" and other approaches that unfortunately never caught on since while "technically feasible, retrofitting old code rarely happens".

As a final note, preliminary benchmarks show checks add an average runtime overhead of 8.6%. Checked C is implemented as an extension to LLVM and can be downloaded here.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Security Vulnerabilities topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter