BT

Lessons Learned from Apple's GoToFail Bug

by Sergio De Simone on Feb 28, 2014 |

The recent security weakness found in both iOS and OS X hints at flaws in coding style guidelines, unit testing, system testing, code review policies, error management strategies, and tools deployment.

Larry Seltzer on ZDNet described the bug as "a shocking and embarrassing one" and observed that the fact that Apple also provided a patch for iOS 6 provides a hint at its seriousness. "I'm sure Apple doesn't want to do anything to make it easier for iOS users to stay on iOS 6, but they patched it anyhow. That's how serious it is," Larry says.

By the time of writing, Apple has already released software updates for both iOS and OS X to fix the issue: a vulnerability in encrypted communications that allowed an attacker to intercept, read or modify encrypted data. Still, some lessons can be learned from this whole episode.

Google's Adam Langley has explained that the bug is located in the SecureTransport framework, an implementation of SSL/TLS protocols released as Open Source by Apple, and is caused by some code becoming unreachable. If you take a look at the code below:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
	OSStatus        err;
	...

	if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
		goto fail;
	if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
		goto fail;
		goto fail;
	if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
		goto fail;
	...

fail:
	SSLFreeBuffer(&signedHashes);
	SSLFreeBuffer(&hashCtx);
	return err;
}

You will notice the two goto fail lines in a row. The second goto fail, due to missing braces around the if block, will always produce a jump to the fail label, thus skipping the following checks. This combines with the fact that, at the moment of the jump, the err variable does not contain any error and will make the method return with no error. Adam Langley goes on to clarify:

This signature verification is checking the signature in a ServerKeyExchange message. This is used in DHE and ECDHE ciphersuites to communicate the ephemeral key for the connection. The server is saying: "here's the ephemeral key and here's a signature, from my certificate, so you know that it's from me". Now, if the link between the ephemeral key and the certificate chain is broken, then everything falls apart. It is possible to send a correct certificate chain to the client, but sign the handshake with the wrong private key, or not sign it at all! There's no proof that the server possesses the private key matching the public key in its certificate.

According to Larry Seltzer we do not know how the bug was found out, due to Apple not giving out much detail, but this episode makes him wonder about code review practices at Apple. He also notes that while the error can be easily recognised when you look right at it, it could be hard to spot when you are looking at the [whole file], which is 1,970 line long.

Many commenters to Twitter's #gotofail topic identified a blatant culprit in the use of goto, famously described as "harmful" in a Dijkstra's paper. While this is undeniable, Arie van Deursen, Software Engineering Professor at Delft University of Technology, Netherlands, explains the use of goto with the attempt at implementing an exception-like mechanism in C through the use of the return-error-code idiom, thus making that idiom the real culprit.

Indeed, van Deursen writes, the return-error-code idiom is ubiquitous in the file containing the bug, and this has its own problems. One of the key findings of a 2005 paper van Deursen authored with Magiel Bruntink and Tom Tourwe is "a defect density of 2.1 deviations from the return-error-code idiom per 1000 lines of code," as resulting from inspection of a large code base. Such high defect density was related to unchecked calls, incorrectly propagated return codes, and incorrectly handled error conditions, showing "the idiom is particularly error prone."

Kevin Marks, a former Apple employee, notes in a comment to van Deursen's post that there are ways to use the error code return idiom more safely using preprocessor macros. Examples of such approaches are BailOSErr and aiming at implementing exceptions in C.

Speaking of how error codes are managed, Chris Leishman commenting van Deursen's post remarks that the use of an error code which is initialised to success is a key factor for the double goto to actually cause the weakness. The system would have shown a safer behaviour if the error was initialised as OSStatus err = OSUnknownError.

On another front, Landon Fuller, software engineer at Plausible Labs, provided a testability analysis of the code affected by the bug and demonstrated that SSLVerifySignedServerKeyExchange is unit-testable in isolation. This, according to C. Keith Ray, strikes a point for TDD: "you can’t write an if statement until you have a test that requires it. You’ll end up with a test for the if statement being true and a test for the if statement being false."

Arie van Deursen also points to more controversial aspects of the story.

He first notices that the file containing the bug "is not routinely formatted automatically: There are plenty of inconsistent spaces, tabs, and code in comments," while "correct indentation immediately shows something fishy is going on" and would have allowed to spot the bug more easily. Along these lines, he goes as far as suggesting that "code formatting is a security feature" and that "white space is a security concern".

Langley writes in his blog that he thinks that code reviews could be effective to prevent such kind of issues. Arie van Deursen points out, though, that in a previous study of how code review is applied at Microsoft, it turned out that

review does not result in identifying defects as often as project members would like and even more rarely detects deep, subtle, or “macro” level issues. Relying on code review in this way for quality assurance may be fraught.

Finally, tools did not help either in this case, since Clang -Wall option will not spot the double goto line and the ensuing unreachable code, as Langley points out. Clang offers a -Weverything flag which would have caught the issue, according to Simon Nicolussi, while GCC drops it silently. This is also confirmes by Peter Nelson, who also points at the existence of a specific -Wunreachable-code option. Van Deursen notesa that the main issue with unreachable code is that its detection is a fundamentally undecidable problem, thus leading to a tradeoff between completeness and false positives, thus possibly explaining why it is not included by default.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Code formatting and lack of rigor in brace usage in C-based languages by D. Keldsen

We don't know how the code got there (bad merge?)...but fully braced statements
 if () { stmt } 

would have caused compilation to fail if that lone line crept in.

Consistent tools for white space are good; defensive code constructs are even better.

Patch 7.0.6 is even worse by Plamen Petkov

Well, if this code smells, I can assure you the 7.0.6 update for iOS actually stinks. Just check out the forums, how many people bricked their devices due to last 7.0.6, which supposed to fix this security issue.

Re: Code formatting and lack of rigor in brace usage in C-based languages by Jim Balter

" fully braced statements
if () { stmt }
would have caused compilation to fail if that lone line crept in."

No, it most certainly wouldn't have. This has nothing to do with braces, it has to do with sloppy or non-existent code review and failure to turn on proper warning levels or use lint-like tools, either of which would have issued a warning for the unreachable code ... or it has to do with ignoring warnings. Savvy shops set their options to make warnings act like errors. Note that this bug could not have occurred in Java, since it considers such unreachable code to be an error. Note also that this error could not have occurred in a functional language. Nor could it have occurred with RAII.

Re: Code formatting and lack of rigor in brace usage in C-based languages by D. Keldsen

Those are all good suggestions, however fully braced code would have made the lone line stick out. I don't think anyone would be really willing to rewrite low level OS code in Java.

theory vs. practice by Bojan Antonovic

While a problem might be undecidable in every case (perhaps needs infinite amount of resources as the halting problem), I take any help in simpler cases.

Most code can be written simpler, even if it needs big refactoring. But them it becomes simpler for verification.

But it's sufficient to say that the Java compiler would have detected unreachable code in similar cases (even if Java doesn't have goto).

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

5 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT