Is C Still A Suitable Language Today?
Damien Katz, Couchbase, believes that C is still a great language for back-end programming, while other developers argue that C has too many flaws, supporting C++ or Java, while others like neither.
In a recent blog post entitled The Unreasonable Effectiveness of C, Damien Katz, the Creator of CouchDB, affirms that C is a great language for the back-end, supporting it in spite of more modern languages such as C++ , Java, or even Erlang or Ruby. Katz does not think C is simply better than any other language out there, but when “raw performance and reliability are critical, C is very, very hard to beat,” - quote taken from a subsequent post meant to clarify his position.
While initially doing much of CouchDB in Erlang, Katz became unhappy with it after spending “2+ man/months dealing with a crash in the Erlang VM”:
We wasted a ton of time tracking down something that was in the core Erlang implementation, never sure what was happening or why, thinking perhaps the flaw was something in our own plug-in C code, hoping it was something we could find and fix. It wasn't, it was a race condition bug in core Erlang. We only found the problem via code inspection of Erlang. This is a fundamental problem in any language that abstracts away too much of the computer.
For that and for performance reasons, Katz decided to progressively rewrite “more of the Couchbase code in C, and choosing it as the first option for more new features.” Interestingly, C has proven to be “much more predictable when we'll hit issues and how to debug and fix them. In the long run, it's more productive.”
Katz outlines several reasons making C better than higher level languages such as C++ or Java for the back-end:
- Expressiveness – “The syntax and semantics of C is amazingly powerful and expressive. It makes it easy to reason about high level algorithms and low level hardware at the same time. Its semantics are so simple and the syntax so powerful it lowers the cognitive load substantially, letting the programmer focus on what's important.”
- Simplicity – “C is a weak, statically typed language and its type system is quite simple. … What sounds like a weakness ends up being a virtue: the "surface area" of C APIs tend to be simple and small. Instead of massive frameworks, there is a strong tendency and culture to create small libraries that are lightweight abstractions over simple types.”
- Speed and Memory Footprint – “C is the fastest language out there, both in micro and in full stack benchmarks. And it isn't just the fastest in runtime, it's also consistently the most efficient for memory consumption and startup time. And when you need to make a tradeoff between space and time, C doesn't hide the details from you, it's easy to reason about both.”
- Faster Development Cycle – “Critically important to developer efficiency and productivity is the "build, run, debug" cycle. The faster the cycle is, the more interactive development is, and the more you stay in the state of flow and on task. C has the fastest development interactivity of any mainstream statically typed language.”
- Debugging – “With pure C code, you can see call stacks, variables, arguments, thread locals, globals, basically everything in memory. This is ridiculously helpful especially when you have something that went wrong days into a long running server process and isn't otherwise reproducible. If you lose this context in a higher level language, prepare for much pain.”
- Cross-platform – “has a standardized application binary interface (ABI) that is supported by every OS, language and platform in existence. And it requires no runtime or other inherent overhead. This means the code you write in C isn't just valuable to callers from C code, but to every conceivable library, language and environment in existence.”
Katz also agrees C has “many flaws”:
… no bounds checking, it's easy to corrupt anything in memory, there are dangling pointers and memory/resource leaks, bolted-on support for concurrency, no modules, no namespaces. Error handling can be painfully cumbersome and verbose. It's easy to make a whole class of errors where the call stack is smashed and hostile inputs take over your process. Closures? HA!
Katz’s love affair with C seems to originate from the need to push Couchbase’s performance limits and debugging problems arising from combining C plugins with the Erlang VM. He does not consider C++, Go or D as a better replacement for C, but he thinks Rust could be “the language of my dreams” if it achieves “C-like performance but safe with Erlang concurrency and robustness built in.”
Katz’ post has sparked a wide debate on Reddit and Hacker News, many developers arguing the virtues of C and suggesting other languages instead. robinei blames string manipulation and error checking hassles:
I always want to get back to C (from C++ among others), and when I do it's usually refreshingly simple in some ways. It feels good!
But then I need to do string manipulation, or some such awkward activity..
Where lots of allocations happen, it is a pain to have to match every single one with an explicit free. I try to fix this by creating fancy trees of arena allocators, and Go-like slice-strings, but in the end C's lack of useful syntactic tools (above namespace-prefixed functions) make everything seem much more awkward than it could be. (and allocating everything into arenas is also quite painful)
I see source files become twice as long as they need to because of explicit error checking (doesn't normally happen, but in some libraries like sqlite, anything can fail).
There are just so many things that drain my energy, and make me dissatisfied.
After a little bit of all that, I crawl back to where I came from. Usually C++.
- C is straightforward to compile into fast machine code...on a PDP-11. …
- C's standard library is a joke. Its shortcomings, particularly around string handling, have been responsible for an appalling fraction of the security holes of the past forty years.
- C's tooling is hardly something to brag about, especially compared to its contemporaries like Smalltalk and Lisp. Most of the debuggers people use with C are command line monstrosities. Compare them to the standard debuggers of, say, Squeak or Allegro Common Lisp.
- Claiming a fast build/debug/run cycle for C is sad. It seems fast because of the failure in this area of C++. Go look at Turbo Pascal if you want to know how to make the build/debug/run cycle fast.
- Claiming that C is callable from anywhere via its standard ABI equates all the world with Unix. Sadly, that's almost true today, though, but maybe it's because of the ubiquity of C rather than the other way around.
C/C++/Java. A programmer's version of Rock/Paper/Scissors.
I started out in C, many years ago. I found myself using macros and libraries to provide useful combinations of state and functions. I was reinventing objects and found C++.
I was a very happy user of C++ for many years, starting very early on (cfront days). But I was burned by the complexity of the language, and the extremely subtle interaction of features. And I was tired of memory management. I was longing for Java, and then it appeared.
And I was happy. As I was learning the language, I was sure I missed something. Every object is in the heap? Really? There is really no way to have one object physically embedded within another? But everything else was so nice, I didn't care.
And now I'm writing a couple of system that would like to use many gigabytes of memory, containing millions of objects, some small, some large. The per-object overhead is killing me. GC tuning is a nightmare. I'm implementing suballocation schemes. I'm writing micro-benchmarks to compare working with ordinary objects with objects serialized to byte arrays. And since C++ has become a hideous mess, far more complicated than the early version that burned me, I long for C again.
So I don't like any language right now.
For some, C looks too flawed and unproductive to be useful today, but others still manage to make good use of it in spite of its peculiarities. The developer community would be probable better off avoiding a war over the best language out there, and rather trying to understand the tradeoffs of each language, choosing the best suited considering the project at hand and the skills available. After all, no language is perfect.
Depends on your goals and environment...
Overall, if you're doing a large, complex, and sustained project with a significant developer staff, I think you're much better off with Java (or Scala, etc.). Save the C for the odd (and increasingly less frequent) occasions where you really need native code, and use a message-based system to integrate the two (JNI sucks). Forget about C++.
Re: Depends on your goals and environment...
I'm with Mark and - to some extent - Damien on this one.
One thing that Damien noted is that it's rare to see really large, comprehensive framework-like APIs in C. As long you as build small, singly-focused APIs with light domains (a few typedefs / structs and functions as the 'contract') then it's very easy to 'export' the library and re-use it. This, I think, is a sweet spot for C. I would never write something in C for performance, but instead because it's simply easier to do certain things in C (kernel programming, embedded programming, anything working with the hardware, anything relying on APIs that aren't both established and widespread enough as to have warranted an cross-platform 'abstraction' in a higher level language like Java, Ruby, Python etc).
I would also try to not write a full blown system in C, simply because it doesn't 'scale' for large projects. The large projects that do use C end up re-inventing a lot of things (like 'objects' and namespacing). IMHO, there are very few domains that truly need to be in C the whole way through. Some very notable exceptions are systems level components like operating systems (Linux) or UIs like GNOME, of course. But for applications, it's easier to build out in a higher level language and 'integrate' with lower level APIs where the higher level language and platform has gaps. Java has many such 'gaps,' though some've been slowly addressed in the last decade as APIs have become commonplace across many different operating systems: event driven IO, file system notifications, file permissions and metadata, etc.
Mark makes a great point about how to integrate C libraries and modules. He suggests JNI sucks, and to use messaging. I am a big fan of messaging. Fundamentally, successful use of messaging and successful use of JNI both require the same thing: you need to simplify the exported API drastically.
When using JNI, I try to never 'leak' any complex C types into my java API and vice versa, always communicating through numeric types and char* -> jstrings. Even if the native code I'm exposing is C++, I'll still use the C-flavor of JNI (not C++) because it forces this canonicalization that's favorable to interop. If you keep the surface area of the C API very light, and try to avoid threading, it's easy to make it work as a native extension via Java JNI or CPYthon or MRI Ruby, etc.
Once you've gone through this process, then exposing the C API via messaging is easier because the message payloads between two systems can't, by definition, be much more complicated than the surface area of the C library. Of course, if you're using messaging, that means either writing messaging code in C, or exposing the C library to some higher level language and doing the messaging there. The nice part about messaging is that insulates your higher-level language code from your C code which - let's be honest - might be flakier than your Java code. I still won't link code written against the imagemagick C APIs directly to my application! There is some black magic in that library...! If the C code dies, the messaging system absorbs the requests until another node running the C code can pick up the slack. On the other hand, if you really are using C for performance, then messaging introduces at least a network hop, not to mention another component in the system, and you might lose any gains you made by writing it in C. In this case, it is possible to use write stable, well-behaved JNI or native extensions, but this again requires keeping the surface area small and understanding the pre-conditions nad post-conditions. No threads. Don't pass around pointers to complex objects between C and Java. Make sure you understand who's supposed to clean up memory, and when.
In all cases, definitely forget about C++ :)
Re: Depends on your goals and environment...
mbeddr was designed to better support embedded software development (although not limited to that) for small as well as large systems based on an extensible version of C language and IDE. Existing extensions include interfaces with pre- and postconditions, components, state machines and physical units, as well as support for requirements tracing and product line variability. Based on these abstractions, mbeddr also supports formal verification based on model checking and SMT solving.
With that approach C can be made "scalable" for larger projects and teams. In addition it allows the introduction modern programming principles. By extending mbeddr you even add primitives for e.g. messaging.
Of course you cannot blame C for race conditions...
So instead of blaming the Erlang runtime for race conditions, you'll end up blaming the C threading library you selected. This is hardly an argument against using Erlang.
On the contrary, the threading model provided by the various C libraries is so complex - compared to the Erlang concurrency model - that you'd probably spend months debugging race conditions in your own code.
Srini Penchikala Aug 21, 2014