Checked C is an open, collaborative project led by Microsoft Research aimed to extend the C language so programmers can write more reliable programs free of errors such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts. Checked C code can coexist with code written in standard C to ease porting.
In an article soon to appear in IEEE Cybersecurity Development Conference 2018, the researchers behind the language present Checked C main features and strengths.
Checked C borrows many ideas from prior safe-C efforts but ultimately differs in that its design focuses on allowing incremental conversion while balancing control, interoperability, and high performance.
The language is focused on backward-compatibility and employs the concept of checked pointers to ensure static and dynamic access verification. In particular, Checked C aims to ensure spatial safety of memory access, meaning that pointers are always dereferenced within the memory they were allocated. The three main design keys of Checked C are:
- C pointer representation is preserved, so Checked C memory layout is the same as C’s. This provides easy interoperability with existing C codebases and libraries.
- Explicitly specifying the boundaries of a chunk of memory allocated to a pointer minimizes compile- and run-time overhead.
- Checked regions and bounds-safe interfaces enables mixing Checked and legacy C code. Fully ported code resides in a sp called checked region, where each pointer access is granted free of spatial memory violations. Checked code can access unsafe, legacy C code through bounds-safe annotations, which are annotations added to unchecked code parameters, return values, function and record types, and global variables.
Checked C adds two new pointer types to the C language, _Ptr<T>
and _Array_ptr<T>
. The former is exclusively used for dereference, while the latter supports pointer arithmetic. For both pointers, the compiler dynamically guarantees their validity before dereferencing them. This is how you define a function taking two _Array_ptr<T>
arguments:
void append(
_Array_ptr<char> dst : count(dst_count),
_Array_ptr<char> src : count(src_count),
size_t dst_count, size_t src_count)
{
_Dynamic_check(src_count <= dst_count);
for (size_t i = 0; i < src_count; i++) {
if (src[i] == ’\0’) {
break;
}
dst[i] = src[i];
}
}
When creating arrays, programmers can specify their checked bounds using the _Checked
keyword:
int buf _Checked[10]
Interestingly, Checked C has partial formalization that allows to prove that any violation of spatial safety comes from outside a checked region. Further work in this direction will aim to extend this proof to dynamically sized arrays as well.
According to Checked C creators, converting existing C code to use checked pointers and arrays requires a relative amount of code changes, affecting about 17.5% of lines. To improve this, the team is at work on tools to make the conversion automatic. This seems to be a key goal, when it comes to fostering Checked C adoption and more importantly the conversion of existing codebases. Indeed, Checked C is not the first project aiming to add some form of memory safety to the C language. GCC contributor William Bader mentioned GCC "fat pointers" and other approaches that unfortunately never caught on since while "technically feasible, retrofitting old code rarely happens".
As a final note, preliminary benchmarks show checks add an average runtime overhead of 8.6%. Checked C is implemented as an extension to LLVM and can be downloaded here.