BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Checked C Extends LLVM to Bring Spatial Memory Safety to C

Checked C Extends LLVM to Bring Spatial Memory Safety to C

This item in japanese

Bookmarks

Checked C is an open, collaborative project led by Microsoft Research aimed to extend the C language so programmers can write more reliable programs free of errors such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts. Checked C code can coexist with code written in standard C to ease porting.

In an article soon to appear in IEEE Cybersecurity Development Conference 2018, the researchers behind the language present Checked C main features and strengths.

Checked C borrows many ideas from prior safe-C efforts but ultimately differs in that its design focuses on allowing incremental conversion while balancing control, interoperability, and high performance.

The language is focused on backward-compatibility and employs the concept of checked pointers to ensure static and dynamic access verification. In particular, Checked C aims to ensure spatial safety of memory access, meaning that pointers are always dereferenced within the memory they were allocated. The three main design keys of Checked C are:

  • C pointer representation is preserved, so Checked C memory layout is the same as C’s. This provides easy interoperability with existing C codebases and libraries.
  • Explicitly specifying the boundaries of a chunk of memory allocated to a pointer minimizes compile- and run-time overhead.
  • Checked regions and bounds-safe interfaces enables mixing Checked and legacy C code. Fully ported code resides in a sp called checked region, where each pointer access is granted free of spatial memory violations. Checked code can access unsafe, legacy C code through bounds-safe annotations, which are annotations added to unchecked code parameters, return values, function and record types, and global variables.

Checked C adds two new pointer types to the C language, _Ptr<T> and _Array_ptr<T>. The former is exclusively used for dereference, while the latter supports pointer arithmetic. For both pointers, the compiler dynamically guarantees their validity before dereferencing them. This is how you define a function taking two _Array_ptr<T> arguments:

void append(
  _Array_ptr<char> dst : count(dst_count),
  _Array_ptr<char> src : count(src_count),
  size_t dst_count, size_t src_count)
{
  _Dynamic_check(src_count <= dst_count);
  for (size_t i = 0; i < src_count; i++) {
    if (src[i] == ’\0’) {
      break;
     }
    dst[i] = src[i];
  }
}

When creating arrays, programmers can specify their checked bounds using the _Checked keyword:

int buf _Checked[10]

Interestingly, Checked C has partial formalization that allows to prove that any violation of spatial safety comes from outside a checked region. Further work in this direction will aim to extend this proof to dynamically sized arrays as well.

According to Checked C creators, converting existing C code to use checked pointers and arrays requires a relative amount of code changes, affecting about 17.5% of lines. To improve this, the team is at work on tools to make the conversion automatic. This seems to be a key goal, when it comes to fostering Checked C adoption and more importantly the conversion of existing codebases. Indeed, Checked C is not the first project aiming to add some form of memory safety to the C language. GCC contributor William Bader mentioned GCC "fat pointers" and other approaches that unfortunately never caught on since while "technically feasible, retrofitting old code rarely happens".

As a final note, preliminary benchmarks show checks add an average runtime overhead of 8.6%. Checked C is implemented as an extension to LLVM and can be downloaded here.

Rate this Article

Adoption
Style

BT