Lyft Replaces Puppet with SaltStack
Lyft, a "ridesharing" start-up, replaced Puppet with SaltStack as its infrastructure configuration management tool. Ansible was the other contender as Ryan Lane, a Lyft engineer, explains in his article. In the end, SaltStack came on top when Lyft considered each tool's ease of use, maturity, performance and the surrounding community.
Considering the ease of use, SaltStack had a steeper learning curve due to the documentation's organization and thouroughness. According to Ryan, Ansible's documentation is simpler to read for a newbie. However, when the code started to grow, SaltStack came on top. Breaking configuration files (playbooks in the case of Ansible, state definitions for SaltStack) highlighted some differences. Lyft's engineers found that SaltStack is more consistent when it comes to inputs, outputs and configuration. For instance, while Ansible uses different file formats (INI Files, YAML), SaltStack uses YAML everywhere. Loopings and conditionals are also done differently. Ansible embedded this logic in the DSL while SaltStack uses Jinja, a Python templating engine. Ryan and colleagues preferred the SaltStack approach. One other deciding element was the "excellent" Saltstack introspection capabilities.
When it comes to maturity, both Ansible and SaltStack were considered mature enough, with all the necessary features for Lyft's use cases. Ryan found that SaltStack is more feature rich though: it can output in different formats and to different locations; it can load pillars, in essence data structures, from different sources; if running as an agent it can fire local events via the reactor system.
Regarding performance in the context of Lyft's use cases SaltStack was faster, especially on no-change runs:
- Full run: 12m 30s
- No change run: 15s
- Full run: 16m
- No change run: 2m
Ryan opened an issue at Ansible, since SaltStack was much faster in the same scenario, but the issue was closed. Over at Hacker News, Michael DeHaan, Ansible's creator, pointed to an article on performance tuning tips for Ansible, although it did not address Ryan's main complaint about the slowness of user-related operations.
On the tool's communities, Ryan and colleagues regard SaltStack as friendlier and more numerous. Ryan remarked that "Ansible is almost solely written by mpdehaan" but Michael DeHaan claims that Ansible "has 810 contributors at this point". Lyft's engineers also find that SaltStack is a friendlier, more helpful community and more receptive to feature requests. They were able to introduce more changes into SaltStack than on Ansible, though SaltStack is "sometimes less rigorous than they should be when it comes to accepting code (I’d like to see more code review)". This seems a matter of philosophy on project management, since Michael DeHaan wrote at Hacker News that "we do say no when we disagree. I think that's important. Filtering and testing makes a project what it is to a degree. "
The main reason to change the tool was Lyft's complex Puppet code base, with around 10000 lines of code. Since Lyft follows the "If you build it you run it" approach, the DevOps team felt the Puppet code base was not suitable to be used by the developers. In the end, both SaltStack and Ansible were able to reproduce the Puppet infrastructure with around 1000 lines of code. When asked about the possibility fo rewriting the Puppet code base from scratch, Ryan wrote:
A complete rewrite from scratch would have likely decreased the line count considerably and maybe the execution time as well (...). That said, I think a rewrite in Puppet would have taken me considerably longer.
Lyft had a few major requirements for the new tool. It should allow for a masterless architecture, as masters add "an unnecessary point of failure and sacrifices performance". The code should be read in sequential order, without any optimizations that broke this rule. The code should be simple, with few configuration management abstractions. The tool should support a design where crosscutting configurations (e.g.: monitoring) and service/application specific configurations can be stored in different repositories.
InfoQ ran an article series on infrastructure configuration management tools, where you can find an introduction to SaltStack and Ansible. We also did a virtual panel with real users from each of the major players in this space. It is interesting to note that the strong points that Ryan points out on each tool are also highlighted on the virtual panel.
There are some inaccuracies in the original post, and some things said are quite frankly unjust regarding community nature. Some of the suggestions provided by Ryan, such that binary files transferred should add newlines in particular, were not good ideas, and this becomes a cornerstone of his article about being treated unfairly.
You can also see some other comments on this article in the Hacker News comparison:
Ultimately, Ryan was a contributor to SaltStack for 2 years before writing this blog post, and that needs to be made clear.
Ansible's always going to make the right decision for the codebase, and it's unfortunate when someone disagrees with a decision, but it's not just to elevate "I disagreed with a technical decision" to "a community is non-receptive".
Pressing the big green "auto-merge" button in an OSS project is also not a good thing, and it's not something any project should strive to do. So if everything gets merged in a day, you should be REALLY worried about that project's level of QA and code review.
Given Ryan's usage of Ansible per GitHub can be documented down to 1 day versus his two years with Salt, it's obvious the time spent learning each tool is disproportionate.
On Hacker News Hiring Trends, Ansible is #2, and Salt didn't make the list.
At both Wikimedia and Webplatform I had considered and tried Ansible and both times I chose Salt. This is the third time I've chosen Salt over Ansible for different use-cases, but the first time I've blogged about it.
For the point of this particular discussion, this was the first serious time I used Salt's state management system so my perspective on Salt vs Ansible using the same feature sets in the same conditions I feel is just.
You say: "binary files transferred should add newlines in particular, were not good ideas", but that's not what I was suggesting in the github issue at all. The issue is still there to reference. The 'contents' argument of the copy module, which gets its input from yaml, which will almost always be text, sets the text contents of files without a valid end-of-file character, which is not POSIX compliant. I referenced the POSIX standard in the bug. It's a bit unfair to misconstrue the bug as invalid.
You also say "So if everything gets merged in a day, you should be REALLY worried about that project's level of QA and code review." To be fair to Salt here, they have a lot of tests and they do review the code. If something will break backwards compatibility or breaks tests it won't get merged. There'll be discussion on how to properly handle it, then a solution will be worked out that's mergable. In many cases the Salt folks will fix the code being submitted on your behalf. The QA done for releases also seems to be pretty thorough.