BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Safe Systems Programming in C# and .NET

Safe Systems Programming in C# and .NET

Bookmarks
50:00

Summary

Joe Duffy shares some of his key experiences from building an entire operating system in a C# dialect, with a focus on areas like garbage collection, low-level code quality, and dealing with errors and concurrency robustly. The examples focus specifically on using the open source C# and .NET projects as they exist today.

Bio

Joe Duffy is Director of Engineering for the Compiler and Language Group at Microsoft. In this role, he leads the teams building C++, C#, VB, and F# languages, compilers, and static analysis platforms across many architectures and platforms. His group began and is now responsible for elements of taking .NET cross-platform and open source. He has 2 books, 80 patents, and loves all things code.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Key Takeaways

  • Security is worse now than it has ever been. Buffer overruns account for approximately 24% of known Day 0 vulnerabilities.
  • Safety is more important than ever, and performance bottlenecks have largely shifted elsewhere.
  • Garbage Collection times are just as expensive as IO and can also be worse as they expand to consume all available space.
  • Do not copy memory needlessly. Byte is almost always a sign of danger.
  • C# delivers productivity and safety, while still delivering good performance from JIT to AOT and everything in between.

Show notes

Midori

  • 0:13 Midori is an operating system that did not use Windows or Linux. Was written in C#. Took 7 years to build and included device drivers, web servers, libraries, compilers and garbage collection.
  • 0:50 Midori was a research/incubation project. It was never an intent to replace windows regardless of what the media said.
  • 1:08 Around 2013, half of Midori team moved over to developer division [at Microsoft] in the compiler and language group which eventually spun in the C# and C++ teams. Took learnings from Midori team into new team, in order to drive innovation.

Systems

  • 2:05 A system is anywhere you are apt to think about "bits, bytes, instructions, and cycles".
  • 2:23 Historically writing a system meant you were interacting with hardware and writing C.
  • 3:18 Systems are no longer the domain of C and unsafe code.

Why C#

  • 3:30 Productivity and ease of use are found in higher level languages like C#. You have Lambdas.
  • 4:00 Security is worse now than it has ever been. Built-in safety with fewer inherent security and reliability risks. Buffer overruns account for approximately 24% of known Day 0 vulnerabilities.
  • 4:30 By using languages like C#, or Go, you do not have inherent memory risks. Other security exploits like cross-site scripting are still possible. But at least you have eliminated an entire class of exploits.
  • 4:52 C# also includes powerful async and concurrency models.
  • 5:10 A lot of momentum behind C#. A lot of resources and tools available for developers.
  • 5:34 A big challenge is taking the knowledge and momentum behind C# and extending it to multiple platforms.

Why Not C#

  • 6:08 Reasons not to use C# include Garbage Collection (GC), allocation-rich APIs and patterns, Error model that makes reliability challenging and concurrency is based on unsafe multithreading.
  • 6:54 [Joe Duffy] Encourages developers to try Go and Rust in order to compare and contrast feature set with C#. A lot of sharing of ideas takes place across communities.

This Talk's focus

  • 8:10 Microsoft is making progress on existing issues so they are not problems in systems programs. New C# library features, patterns that you can apply today including enforce with Roslyn Analyzers, advances in code generation technologies and open source using GitHub, co-developed with community.

Performance

  • 9:30 Low level, code quality performance. JIT compilation is bad for systems programs because you can't spend enough time on optimizations. The kinds of optimization that the compiler can do are very 'dumbed down'. The speed of compilation, not the speed of the resulting code is the most important factor.
  • 9:45 Ahead of Time (AoT) compilation depending upon a debug or release build, you may spend more time optimizing. If you look at C++ vs C# code you could see huge variances in code quality.
  • 10:10 There may also be security concerns with code being generated at run time where as if you build the code ahead of time you can ensure it is sealed and cannot be modified.
  • 10:20 AoT compiled code can be found in C# and .NET. Windows App Stores requires code to be compiled this way prior to being allowed in the store. As a result, you don't need to run the JIT compiler at runtime. This improves the performance of start-up time for mobile apps.
  • 11:08 Don't think of (JiT) and AoT to be two completely different models. You have dials and can adjust as required and let the system decide when the compiler will compile the code.

Code Generation

  • 12:00 There are two different compilers used for Code Generation for C#. RyuJIT and LLVM which is an open-source, highly optimized compiler.
  • 12:55 LLILC is an Intermediate Language (IL) translator for LLVM.
  • 13:37 If you install the new CLR, you are using ready to run, a version resilient image format, which allows for some version resiliency at the expense of some code quality.
  • 14:10 CoreCLR and CoreRT runtimes, both which are open source. which run on top of Windows, Mac OS X, Linux and others.

Optimizations

  • 14:32 Inlining is super important. Historically something the JIT compiler did not do a good job of.
  • 15:00 Historically these optimizations have not been in the domain of languages like C# and as a result has received a bad reputation for code quality and performance. By including these optimizations, the code quality has increased and is closer to being on-par with C and C++.

Inlining

  • 15:51 Function calls are not cheap. New stack frame, save return address save registers that might be overwritten, call, restore adds a lot of waste to leaf-level functions (10s of cycles).

Range Analysis

  • 17:14 An area where C# used to struggle compared to C++ is bounds checking. We won't execute the loop body if we are out of bound of the array. Loop obviously never goes out of bounds but a naïve compiler will do an induction check (i < elems.Length) plus a bounds check (elems[i]) on every iteration.

Stack Allocation

  • 18:56 If we are looking to see if ‘Alexander Hamilto’ is one of our customers. We will us a Lambda expression to look up the IndexOf. Two objects will get allocated in the stack for a short lived function that emits code. As a result of this, many people in Midori project were not using Lambda based APIs which is unfortunate because people like to write code in C#. Instead, the compiler can do automatic escape analysis. The compiler can use this knowledge to stack allocate the objects. We still have the objects, but the compiler will put them on the stack where it is very cheap to execute.
  • 21:45 It is great that compiler will perform this analysis but what if I wanted to control it? By using a new preview feature called Scoped, the compiler will tell you if there is anything that will prevent the optimization from taking place.

Memory

  • 23:30 Memory speeds have not kept pace with processor speeds. Latency Numbers every programmer should know include main memory which is expensive to access as it takes 100ns. IO dwarfs them all. Garbage Collection times are just as expensive as IO and can also be worse as they expand to consume all available space.

Garbage Collection

  • 24:45 .NET does have a multi-generational, compacting Garbage Collection. Concurrent background scanning for automatically reduce pause times. For server workloads in .NET 4.5 you can use concurrent and parallel garbage collections in harmony.

Garbage Collection Pitfalls

  • 26:23 Objects that should die in Generation 0, can live longer unnecessarily and as a result, will be a more expensive operation. Mid-life crisis objects that should have died in Generation 1, live to Generation 2 which could add 1 millisecond of latency.
  • 29:21 If your application is spending more than 10% of its time Garbage Collecting, then that is a bad thing. Time spent garbage collecting is time your application is spending not working. Big Data programs may spend a lot of time in Garbage Collection.
  • 29:32 Multi-threading complicates things, code may perform well in isolation. But because actions may execute in parallel, the GC will execute sooner, which may promote objects into later GC generations.

C# Values

  • 30:19 C# has two major type kinds: structs and classes.
  • 31:47 Stucts can improve memory performance as a result of less GC pressure, better memory locality and less overall space usage.
  • 32:57 Beware of copying large (32-64 bytes) structs, it is essentially the equivalent of doing a memcpy.
  • 33:33 Byrefs can be used to minimize copying: "ref returns" is a new feature in C# 7.
  • 34:04 New features in next release of C# and .NET: ValueTask, ValueTuple and others. ValueTask will return a thin wrapper that the compiler is aware of and is more optimal.

Strings and Arrays

  • 35:23 Strings and arrays are often subject to premature graduation, especially Big Data scenarios. String.Split allocates 1 array + O(N) strings, copying data LINGQ query allocates O(2Q) + enumer* objects. ToArray allocates at least 1 array (dynamically grows).

Span

  • 37:49 Span was used a lot in Midori. Span is a struct, or slice, of an array, string, native buffer or another span. The key is you can never make it bigger.
  • 38:47 Uniform access available regardless of creation. All accesses are safe and bounds checked.

Packs

  • 39:59 Small arrays often end up on the GC heap needlessly. A Pack is a fixed size, struct-based array that interoperates with Span APIs. Is ideal in temporary allocations.

Zero Copy

  • 41:10 Do not copy memory needlessly. Byte[] is almost always a sign of danger. If it's in native memory, then keep it there. Span/Primitive make it convenient to work with byte* in a safe way.

Reliability

  • 42:30 Exceptions are meant for recoverable errors, but many errors are not recoverable. A bug is an error the programmer didn't expect; a recoverable error is an expected condition, resulting from programmatic data validation. Treating bugs and recoverable errors homogeneously creates reliability problems.

Fail-fast

  • 43:30 If you know you are going to fail, do it as fast as possible. For places where exceptions delay the inevitable, it invites abuse.
  • 44:00 Fail-fast ensures bugs are caught promptly before they can do more damage.
  • 44:15 In Microsoft's experience 1:10 ratio of recoverable errors (exceptions) to bugs (fail-fast).

Contracts and Asserts

  • 44:47 Microsoft is providing an API for developers to call in their code. An API called Environment.FailFast exists that will log the current stack trace and then tear down the process.
  • 45:18 Other parts of the industry are also moving to fail-fast including C++ programmers and the Microsoft Edge browser uses Fail-fast.
  • 45:30 Speech Server, which is now in Cortana, was ported to Midori. When they did so, they discovered that more than 70% of all test cases were failing for Taiwanese production speech requests. Once Fail-fast was implemented, all of these errors became visible.
  • 46:22 Fail-fast API can be used in Debug mode only.

Immutability

  • 46:37 Immutability can improve concurrency-safety, reliability (no accidental mutation), and performance (enables compiler optimizations).
  • 47:46 In C#, readonly means "memory location cannot be changed".
  • 47:52 An immutable structure is one with all readonly fields.
  • 48:03 A deeply immutable structure is one with all readonly fields, where each field refers to another immutable structure (including primitives).

Languages mentioned

People mentioned

See more presentations with show notes

Simple Made Easy

Scaling Uber to 1,000 Services

WebSockets, Reactive APIs and Microservices

The Death of Continuous Integration

Recorded at:

Aug 20, 2016

BT