Bio Christian Legnitto leads Mobile Release Engineering at Facebook. Christian has worked in release engineering for over seven years and prior to joining Facebook Christian was a Release Engineer at both Apple and Mozilla. Facebook: https://www.facebook.com/legnitto, Twitter handle: @LegNeato
Software is Changing the World. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.
I work at Facebook, I manage the Mobile App Releases, so the Android and iOS pp. I manage how we develop those as well as how we ship them.
Yes it’s kind of interesting. We used to have an Android team and an iOS team that was responsible for kind of rebuilding the Facebook experience on these different mobile platforms. But now, we now have product teams that own it everywhere. So a Photos engineer now doesn’t just work on facebook.com they also work on Android app and on the iOS app. And that really helps us to scale and it makes it so that people who wake up at night thinking: “Here is some amazing Messages feature or here are some amazing Events feature. If only it did this, I want to get in and I want to work on that”, it makes so that those people bring that experience not only to the web but to our mobile products too. It’s a great way to develop.
Yes. We have used Facebook groups internally to organize, we’ll post “Hey I have a question about this”. We have tools that once we say this should never happen or this should always happen, we codify that in tools so if you’re a new employee or even if you forget, the tools themselves say: “Hey you shouldn’t be doing this or maybe you should try doing this”.
Yes. Last year we switched to a date based release. That means that we ship every month on iOS and Android and we don’t let that slip. But we also don’t want to let quality slip and we don’t want the schedule to slip so we slip features. If features aren’t ready we don’t try to slam them in. We want the best experience possible and we’re willing to cut some features if that’s what we need to do to get all the performance improvements and the stability improvements out to users quickly.
We are strict. The date does not move and quality doesn’t move either. The only thing that moves are features, and they go.
Yes, we created our own system called Phabricator, it’s a really great tool for doing code reviews. You can comment on the diff, you can put image macros, memes. And we open sourced it so you can use it, it’s on GitHub if you go to Phabricator and every piece of code that Facebook engineers write whether it be for the mobile apps or for the websites or for the backend goes through a code review. It’s really amazing; it helps to spread knowledge as well as giving an additional check for quality. And it ties into our continuous integration system so we can know if it does build, if the test passed and stuff like that. And it really really helps quality and at surfacing issues very very soon before it gets into the product and potentially out to users.
Yes, it depends on the code. Some people go: “I know this code is hairy so I want a review on any diff in here”. So it will be added on automatically by the tools. How it works is you’re an engineer, you develop the piece of code locally, you test it, you make sure it’s good and you put a diff or a patch up to Phabricator. Phabricator goes: “Ok you want these three reviewers and this other person wants to review everything in this folder” and then Phabricator sends emails to everyone and says: “Hey there’s something to review”. They go through; they can comment on individual lines of the patch, they can comment on the overall way to go. They can say: “Hey look, one of these tests failed you need to go look into it”. And then the patch author then keeps on iterating on it until everyone agrees it’s good and then they land it into the tree.
9. What’s the build infrastructure like? So after the review it gets probably added to some build system or continuous integration system. Is that some magical big cloud so to speak on your infrastructure?
It’s fairly big, we use Buildbot which is an open source build system, more like building blocks for a build system you put together. So we use that, so when a developer goes and writes a patch and puts it up for review the system Buildbot will come through, run every build on Facebook for iOS, Facebook for Messenger, Instagram for iOS anything that’s potentially relevant for the code you’ve touched. As well as all the tests and it will run all those and report back in Phabricator and say: “Yes, it’s all good, it’s all Green, this is good” or say “Red”. And that really helps the reviewers because the reviewer, an engineer does not have to spend their time looking at something that just doesn’t build or something that caused a test to fail. Because you know that’s going to change anyway. So it’s really really powerful. We try to run builds and tests as often as possible.
10. Talk about testing. So, you supported, you support iOS but you also support Android which is fragmented. I’m going to get hate mail for that, but it is, sorry. So how do you deal with that, do you have lots of testing, do you have some magic ingredients, do you have a group of test monkeys that does that? What’s your approach?
So Facebook does not have formal QA, we don’t have a QA team, we don’t want humans signing off and being in the way of shipping, we want machines to tell us if it’s good or bad. We do test on iOS, we test on Android and we do test on mobile web. Mobile web is very big as well, it’s not just a native apps. We do a combination of unit tests, of integration tests across all those platforms, mainly using open source tools and then we do have end to end UI testing, kind of clicking on buttons and scrolling and stuff like that. On Android, while you say it’s fragmented, you know that could be up for debate, but Android has this really great program in the Google Play store, it’s a Beta program and an Alpha Program. So we started to do this in June of this year and announced this Beta Program. And we quickly got a million active users running our pre-release builds, helping us test and just by using Facebook. And this has been great for quality, has been great for us, we love them, our Beta testers are amazing. And there are over a hundred and fifty countries right now where people are testing our app and giving us feedback. Over fifty manufacturers, while we do have a device lab, we can’t get everything, we can’t do it in every situation, we can’t do it in 2G in India while also using a phone from Africa. It’s just complex in general on mobile and the Beta and Alpha Program on Android really, really let’s us test it in the real environment and have users gives us great feedback weather we’re doing a good job or not.
11. So you can actually, this is an interesting point as you say, there are several services that offer exactly that: a big bank of thousand of phones with robot arms tapping on them. So you don’t have that at all?
We have a device lab, we do. It’s just again the variance just in the nature of not even just Android, on iOS someone can take their 3GS, out of the country, on a different carrier, maybe their battery is a little old and isn’t keeping the charge or maybe it’s interacting with the different app they have. It’s, the variables are just so wide that you really kind of need to test in the environment that is going to actually ship in. We learned this on web a long time ago, which is why there is a big push in A/B testing and continuous integration on the Web. The Beta and Alpha Program gives us something very very similar for native apps and I don’t know how we shipped without Beta and Alpha before. It is great for quality and we love our Beta and Alpha users.
We test a couple of things. We test, we run our automated UI tests on the phones as well as an individual engineer can, when they’re writing a patch, go: “I’m interested in how this behaves on a Gingerbread phone or on a Froyo phone or something like that”. And they can walk up and check out a particular phone or device and use it. But really Facebook believes in dogfooding, our employees find a massive amount of bugs and then the next layer of the onion is Beta and Alpha. The amount of good feedback we get from Beta and Alpha wildly trumps anything from the particular device lab.
Werner: That’s very interesting yes.
We love our Beta and Alpha users.
Currently Apple does not support an Alpha or Beta Program yes.
Yes on mobile we do, we built our own system. We recently posted about it to the Facebook engineering blog so you can check that out. At Facebook scale we kind of want to know if people enjoy the new navigation for example or if this Photos thing we’re doing is going to help them share photos with their friends more and so we want to do more and more slow roll outs. The code is shipping, it has to be there but it will be on and off for certain people and this is again where Alpha and Beta gives us great early feedback. While we’re developing the feature they can give us a signal like: “Hey I don’t like this photo thing” or “Hey this photo thing is really really great, I invited all my friends to come into the Beta”. So we get some early feedback even before it gets to production. But when it does get to production we generally want to slow roll it out and we built our own system and there’s a post on the engineering blog, post that you can check it out.
Yes, we try to open source as much as our tooling as possible, especially on mobile. The industry is so young right now and everyone is kind of duplicating work that we want to take our experiences and our code and we want to move the industry forward. Chances are if it works at Facebook scale, it’s going to work at your scale. So one of the tools we wrote is an Android build tool called Buck. It’s a build system just like Ant or Maven or Gradle. Those three are great but when we were looking to switch off Ant a couple of years ago nothing really fit the bill. And so we went and built Buck. Buck is great, it’s written in Java, it’s really easy to hack on, it’s open source, it’s on GitHub and the main thing is it’s fast. Facebook engineers want to move fast. We have this thing we say: “Move fast and break things”. We’re ignoring the “break things” on mobile but we want to move fast and at the core of it is our engineers being able to iterate fast on diffs. And if the builds are slow they’re not going to be able to be as productive. So Buck is very very fast. It has a distributed build cache. Frequently when you are working on the same product you’re kind of building the same stuff. You’re maybe editing this one file while you need to build the rest of the app especially if the other guy next to you has already built it. So, the distributed build cache really speeds builds up and we tied that into Buildbot. So Buildbot itself will build things and feed it into the cache. So chances are our mobile engineers only need to build that little one thing that they are editing so you get a really fast compile and test cycle.
Buck does a bunch of tricks, not at the sacrifice of correctness but that is a value add. The Gerrit open source project recently switched from Maven to Buck and sped up the builds I believe it’s by 68 to 92%. I don’t have the numbers right off the top of my head but it’s over 50%. And they are doing that without even using the build cache. So build cache is an additional layer on top.
Yes, we have more on GitHub, we actually have docs saying why it’s fast. One of the things we know when we’re building Java, we can do some smart things with the Java ABI and we can only rebuild not only when the code changes but when the assumptions between the modules change. So we know if you have to actually build the dependent module because we know kind of the innards of Java. And that makes it so we are always building the minimal set. Summarily we have a tool called buckd, a daemon that runs in the background. So while you are editing your code, buckd is sitting in the background compiling your code, using you machine’s spare cycles. Humans are kind of slow, you think: “Ok I saved, now I want to go run my Android app”, you go to compile it, it’s a no-op, because it’s already done. So, there’s just a bunch of really cool tricks like that and it’s been pretty amazing to see for us how great it’s been as well to see open source projects like Gerrit get the benefit of it too.
XCtool solves that a lot of Apple tools are great are very focused on an engineer sitting at their desk and pressing “Compile” or a small team contributing and passing code around and doing builds locally. At our scale with lots and lots of engineers contributing to the app it sort of falls over. Apple does have a command line tool called XCode Build that is XCode from the command line but it’s a little rough around the edges. And so what we did is build XCtool which stands for X Code tool which again not very creative of a name and it wraps all of Apple’s tools. It’s written in Objective-C and so it can kind of poke at the innards and it wraps Apple’s XCode build, it’s a drop in replacement and makes it generally sane. It makes better output for engineers, so no longer do you have to wade through a bunch of lines of output that’s unrelated to maybe a build failure. It says right at the bottom the build failed because of this. And because it’s structured output we can pipe it into Buildbot with JSON or we can pipe it into Jenkins, a lot of people on open source are using it.
It’s becoming the de facto way to build iOS and Mac apps from the command line and in continuous integration. So, it’s really really great. One of the things we needed to do it because we have a lot of application tests so kind of unit tests that run in the context of the simulator so it needs the simulator environment. And XCode Build doesn’t actually support that. So, we needed a way, we want to run tests, we’re really really big on quality, we want to run tests when a diff gets put up, when we’re ready to release, we want to make sure it’s all good and we couldn’t do it from the command line with XCode Builds so we build XCtool. For A2SL we get crash reports, we hope we never crash but when we do we want to get the report so we can act on it. And what iOS does it sends us a stack with a bunch of addresses and you need the map back from addresses to human “This is the function you called in the code”. And what you do is you do that with a tool called A2S. If you’ve ever done this in XCode, it’s using A2S under the covers. And we want to go, at our scale it just wasn’t, we weren’t able to do it on Mac Minis or on MacOS X. We did too many reports, not because we’re crashy, just because of the nature of native software, sometimes the OS bugs. And so we really needed a way to process these things quick and on our existing Open Compute Linux boxes. So we went off and we built A2SL. It was actually built by a kernel engineer where I just sent him a message and said: “Hey, you want a kind of cool fun project?” And he went off and he built it. And this let us get A2S off the Minis and get it on the Linux box and that’s the first step for identifying crashes. It’s getting it in a human readable form so that we can actually act on it. So this was really really great and then we opened sourced it, too. Hopefully you won’t ever need to do that because your app won’t crash but if you do you can now do it on Linux if you want to or I believe it works on AWS so if you want to scale it that way.
Werner: Ok. So our audience has a long to do list and we’ll post links to the various projects.
Werner: Lots to check out so thank you Christian.
Thank you very much.