Is it Possible to Test Programmable Infrastructure? Matt Long at QCon London Made the Case for "Yes"

At QCon London, Matt Long, QA Consultant at OpenCredo presented "Testing Programmable Infrastructure with Ruby". Key takeaways included: it is possible to test programmable infrastructure at the unit, integration, and acceptance level; unit testing of infrastructure typically has low return on investment; Ruby (alongside ServerSpec) provides the power of a full programming language for integration and acceptance tests, and is often understood by both testers and sysadmins; and it can be slow and expensive to provision cloud-based infrastructure as part of a test.

Long began the talk by defining programmable infrastructure as "the application of methods and tooling from software development to the management of IT infrastructure", and discussing how this topic relates closely to automated provisioning and configuration of cloud and other 'virtual' infrastructure, and 'infrastructure as code' and 'configuration as code' in general. Tooling has emerged within this space over the last five plus years, and includes the likes of Puppet, Chef, Ansible, SaltStack, AWS CloudFormation, and HashiCorp Terraform.

Based on his consulting experience, Long stated that programmable infrastructure has many benefits, including codification (and version controlling) of configuration, and repeatable and consistent creation of environments. Defining programmable infrastructure can get complex, and (as with any software development activity) testing must be used to mitigate complexity and risk. However, in Long's experience, the testing of programmable infrastructure is rare.

Long introduced a 'cloud broker' project that he had worked on over the past year, where users of a web-based user interface could specify desired cloud-based compute resource and storage alongside their required cloud platform (e.g. AWS, GCP, Azure), send this request for management approval, and once approved the cloud broker application automatically provisioned the cloud resource and provided the users with SSH and VPN connection credentials. Although the user-facing components of this system were easy to test (using Selenium, Cucumber, and Java and JUnit), the programmable infrastructure components were more challenging to validate.

Asking "what do we need to test?" in regards to infrastructure, Long suggested that the answer can be related to the well-understood concept of the testing pyramid. For example, a unit test can assert that deployment scripts work as expected, an integration test can validate whether operating system services are running correctly, and a functional acceptance test could assert that a system user can SSH into a provisioning instance and perform an arbitrary operation.

Infrastructure Testing Pyramid

Unit testing can be conducted on Bash scripts, Ansible configuration, and Terraform code, but Long stated that this is often difficult due to lack of tooling, and also typically offers a poor return on investment; as much as the code is declarative, the tests simply echo back that declaration. The majority of solutions offer linting, e.g. Terraform validate, and there are bash unit testing tools, such as Bats, for complex or high-value bash scripts.

Integration testing can be conducted with ServerSpec, a Ruby/RSpec-based infrastructure testing framework, which can SSH into compute instances and assert that OS packages are installed, services are running, and ports are listening. Long stated that ServerSpec code is often very readable, by sysadmins and testers alike, and the community associated with the framework is large and helpful. The Golang-based infrastructure testing framework Goss was also considered for integration testing, but the required installation of a binary on the infrastructure under test, in combination with no support for MS Windows based compute, meant that this was not a viable option.

Functional acceptance testing can be defined and orchestrated by Cucumber, and higher-level assertions about installed applications and services can be made. Long discussed that he had implemented the Cucumber Step Definitions using Ruby, ServerSpec, and the 'Win RM' gem (for testing MS Windows-based compute). The Ruby programming language was chosen due to sysadmins on the team already being familiar with this language, and the fact that Ruby was already being used within the application infrastructure definition logic (thereby minimising the technology stack).

The talk concluded with a series of lessons learned in the form of 'the good, the bad, and the ugly'. The good comprised of: specialised tests for each layer in the testing pyramid (allowing separation of concerns, and quick execution of unit tests); quick, expressive ServerSpec tests; the use of Ruby to provide the power of a full programming language for user tests; and the fact that this type of infrastructure testing is most definitely doable on a project of this scale.

Long presented the bad parts of his learnings as: an over reliance on acceptance tests (which could morph into the 'ice cream cone' testing antipattern); the required context switching between two test suites - one for the UI, and one for the infrastructure; and the fact that as an engineer than predominantly focuses on testing user interfaces, Long initially felt that he was working out of his comfort zone. The 'ugly' part of the learnings included that starting infrastructure for a test is very slow (resulting in latent feedback for developers), and it can also be expensive, as any cloud infrastructure that is provisioned for a test has to be paid for (often by the hour, even if the infrastructure is only provisioned for minutes).

Was is difficult testing the provisioning of programmable infrastructure? Yes.

Was it worth doing? Absolutely! The ability to add new functionality and refactor without regression was awesome.

The core conclusion of the talk was that despite all of the challenges of testing infrastructure, it was definitely worth doing so. The ability to automatically and reliably assert functionality and properties of infrastructure meant that the team could continually add new functionality, refactor existing code, and utilise new versions of cloud infrastructure with reduced risk. Long stated that testing is important, but often ignored, and testers and sysadmins/operators should work together more closely. Infrastructure tooling does existing, but be prepared to write custom modifications and integrations.

Slides for an earlier version of Long's talk "Testing Programmable Infrastructure" can be found on SlideShare.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write for InfoQ

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter