The book Fifty Quick Ideas to Improve Your Tests by Gojko Adzic, David Evans and Tom Roden provides suggestions that cross functional teams that are doing iterative delivery can use to manage and improve their testing activities.
InfoQ readers get a 50% discount on this book when using the Leanpub Discount URL. This discount is valid until July 23, 2015.
InfoQ interviewed Adzic, Evans and Roden about why they wrote this book, how quantifying quality can support testing, balancing trust levels when testing large and complex systems, why automating manual tests is almost always a bad idea, on using production metrics in testing, how to reduce or prevent duplication in test code, and on upcoming books in the fifty quick ideas series.
InfoQ: Why did you write this book?
Adzic: We keep seeing similar problems working with many different teams. Although each team is unique and has their own set of constraints, the solutions to basic problems are often the same. For example, when automated tests involve database work, tweaking the test runner to execute tests inside database transactions immediately takes care of a large set of issues that we keep seeing over and over again with new clients. From an extrinsic perspective we wanted to help raise awareness about such solutions, that have been around for a while, but for some reason people rarely know about them. From an intrinsic perspective, writing this book forced us to finally sit down, categorise and clarify those ideas so that we can move through recurring easily solvable problems faster, and deal with more interesting challenges.
Evans: We have all had a close connection to testing in our careers. We’ve explored other interesting aspects of what affects software quality, from very technical issues around automated testing, to more human issues like identifying genuine value. This book is almost a return to the fundamentals -- it includes some things that we’ve been saying and teaching for a long time, others that have become apparent only by looking at tests and testing in a new light. When we were writing the previous book, 50 Quick Ideas to Improve Your User Stories, a lot of the ideas we wanted to include were about testing stories, and we felt that the subject merited its own book.
Roden: Over the past years we’ve helped teams with many testing challenges, so we thought it would be a nice idea to pool our experiences into a form that would serve as a toolkit for practitioners. We’ve found ourselves addressing similar problems in a similar way so many ideas in the book are those that we have all used regularly and in different situations, domains and technologies -- we think this will make them widely applicable to lots of teams. There are ideas to resolve specific problems, tips to improve the effectiveness and value of testing as well as ideas to try out that offer alternative approaches to testing. We hope this mix of things will make it an interesting book for people to use and refer back to for ideas.
InfoQ: Which audience is the book aiming at?
Adzic: This is a book for cross-functional teams working in an iterative delivery system, that need to effectively collaborate on testing and ship their products under tough time pressure, frequently. Given that context, it will mostly be useful to developers and testers that know the basics of testing - we do not cover any of the basics, but jump into improvement ideas. That formula seemed to work well for our previous “Fifty Quick Ideas” book on User Stories. There are plenty of good basic resources out there, but not not that many that are covering advanced topics.
With a mix of ideas that touch on exploratory testing, designing good checks, improving test automation and managing large test suites, we hope that there will be a few interesting pointers for almost any team working on software products today, regardless of their context.
Roden: As Gojko says, the primary audience is iterative delivery teams with a mix of skills. Though I hope there is something in here for anyone looking for ideas to improve their testing. Some of the ideas we describe are quick to implement and get to grips with, others are more involved.
Evans: Yes, these are “quick ideas”, but not “simple ideas”. It’s definitely not a beginners book. I hope that testers will use it to recognise and help articulate to others why their craft is one that requires skill and thought. I also hope that those in other roles will use this book to help understand the range of things we need to get right in order to do testing well.
InfoQ: One of the ideas in the book talks about quantifying quality. Over the years I've been doing this, and have experienced how this can help to get a better, shared understanding of the required quality. Can you elaborate how quantifying quality supports testing?
Roden: Like beauty, quality is in the eye of the beholder. This innate subjectivity can lead to wide ranging opinions on what good quality is and what attributes display those qualities.
To ground understanding it is essential to quantify and visualise quality. This works on several levels. At a story or feature level we quantify a quality target in the form of acceptance criteria, we can also set a holistic picture of quality at a product level. Many teams use acceptance criteria for stories these days but criteria are often still ambiguous, like ‘must be fast’ or ‘must be reliable’, which leaves vast potential for error in the suitability of solution.
We’ve found it useful to quantify quality at both feature and product level. Then there is a clear target for discussing feature acceptance, and also a higher level vision of quality that the feature falls within and that directs testing.
Adzic: It helps to paint a target on the wall that everyone can see. Many teams suffer from a misalignment problem where some team members get accused for nit-picking and slowing down delivery, and others get blamed for rushing too quickly and not caring about quality. Unless those people have a common definition of what the word quality means for them, any discussions on improving it will be subjective and often turn into personal arguments. When there is a globally agreed target, whatever it is, it’s much easier to get on with real work and decide if something needs more work or if it is good enough.
Evans: Tom Gilb is well known for arguing that any aspect of quality can be quantified if you try hard enough. He says “Quantification, even without subsequent measurement, is a useful aid to clear thinking and good communication”. That’s the idea I like, especially as it separates quantification (being able to express something in terms of quantities) from measurement (finding an actual value of a quantity at a point in time).
Too many times we hear well-meaning stakeholders talk about the importance of good quality, but as Gojko says, the problems start when those responsible for delivering and assessing good quality don’t have a common basis for what it means in practice, and everyone ends up feeling disappointed.
InfoQ: Another idea is about documenting trust boundaries. Can you explain why it matters to balance trust levels when testing large and complex systems?
Adzic: When multiple teams work on the same piece of software, there is a lot of potential for information falling through the cracks at team boundaries. Although it might be a pessimistic thing to say, for many organisations today that’s the reality. People within a team have a higher level of trust and higher bandwidth communication than with people from other teams. When two teams depend on each-other to deliver, the responsibility for identifying cross-team problems or influences is often left undefined, and people start making huge assumptions about the work of the other group. That can lead to a lot of duplicated work across teams, for example one team testing what the other teams have already sufficiently de-risked. It can also lead to a lot of unjustified trust, where a problem with one piece of the puzzle has a cascading effect and causes issues for many teams. Without a clear agreement, different people on a single team will probably take different approaches towards inter-dependent work.
The idea about documenting trust boundaries helps people within a team agree on how much they trust the others, so they can at least coordinate work internally. One person won’t rush ahead by blindly believing a dependent piece of work to be great if other people on the team know about potential problems. On the other hand, if the team agrees that they can trust something, people won’t waste time testing edge cases around that component.
Documented trust boundaries serve another purpose: they help to divide problems into expected (that can be checked by automated tests) and unexpected (that are inconsistent with current trust boundaries). Teams can focus most of their automated testing investment on de-risking expected integration problems, and run exploratory tests to try to violate trust boundaries and look for unexpected problems. If a documented trust boundary gets violated, then a whole range of assumptions that people made around that boundary needs to get revisited.
Evans: I think that covers it.
Roden: Hahahaha, never let Gojko answer a question first :-)
InfoQ: The idea "contrast examples with counter examples" explains how testers can come up with scenarios where the new functionality does not apply, which they can consider for testing. Can you give examples using this tip from teams that you have worked with?
Adzic: Counter-examples are valid cases that can and will often happen, they just aren’t the completely happy path that everyone focuses on. I’m sure David will pick up on this more, but counter-examples make the primary examples clearer because they show contrast with important stuff. A few examples that all talk about happy cases is like using white text on a light background - difficult to grasp quickly. Adding good counter-examples is like changing the background colour to black, so that the white text stands out clearer.
Evans: That’s exactly it, the contrast of examples against counter-examples, of foreground against background, is what gives us a sharper picture of what we are trying to test or understand. Counter-examples are not about scenarios that should not happen, but valid scenarios that lie at the ‘edges’ of the behaviour under test.
In a team I was working with recently we needed to add a handling fee when certain types of payment cards and transaction types were used. We had lots of examples showing correct calculation and display of the fees under different situations. But it was equally important to show examples of what happens when no fee applies. Report layouts, payment summary messages, email and SMS receipts would all be different depending on whether a fee was or was not applied. The ‘no fee’ cases were counter-examples that contrasted against the ‘fee’ examples.
Roden: I’ve always liked the light and shade imagery that David extols with example and counter-example. Part of the clarity of this idea for me is that you don’t need a massive set of examples to illustrate the behaviour under test, in fact quite the contrary. A small set of each provide better contrast than a mass of tests that jumble the key behaviours. Supplementary tests are fine, but think of keeping the small few as the key examples for acceptance.
InfoQ: You mentioned in the book that automating manual tests is almost always a bad idea. Can you explain why?
Evans: It is a fairly common syndrome that an organisation or team launches into test automation as a way of saving test execution time, especially if they have a large manual regression suite, only to find that they are worse-off than they were before. The naive approach is to do something like record-and-replay automation of the existing tests. This just creates all sorts of new problems, and any saving in execution time is quickly eroded by a growing maintenance burden.
Adzic: Manual tests are captured from the perspective of optimising human work. People can deal with small inconsistencies without much trouble, and they can use their better judgement when the terrain differs from the map. But they aren’t good at repeating laborious and slow tasks. That’s why good manual tests are often written as guidelines, not prescriptions. Even the horrible mechanistic manual tests that we often see in the wild are written from the perspective of optimising human work, so that a single laborious set-up gets reused for many cases.
Machines, on the other hand, are really good at repeating difficult, boring tasks. A computer won’t have a problem setting up a customer credit account 500 times exactly the same way. But machines are horribly bad at dealing with small inconsistencies. So a single set-up that might affect 500 dependent cases is a pretty bad idea. One thing going wrong somewhere in the middle will break hundreds of tests, which will be difficult to understand, troubleshoot and maintain.
Because of those two different contexts, it rarely makes sense to just automate the execution of something that was designed for a human. Teams will get a lot more value out of rewriting such tests from scratch and optimising them for the constraints of unattended repeatable execution.
Roden: Automating tests by exactly following every step in a manual test script leads to brittle, long-winded and very hard to read tests. Manual test scripts can often be written with tens of test steps in order that anyone can execute them without assumed product or domain knowledge. But this length often means that the purpose of the test is lost, buried somewhere within all those steps. So automating that same way will very quickly distort the purpose of tests (or lose the purpose in automating and test the wrong thing). People adding new tests later on will not know if something is already covered leading to duplication and waste.
Also, when these tests break, if the purpose of the test is unclear then it is also unclear whether the test or the system is at fault. Tests of this nature very often get switched off or at least suffer maintenance cost rivalling just running them manually.
When UI automation tools were de rigeur, directly automating manual scripts was the common way of getting automated checks that demonstrated user acceptance. Those set-ups Gojko refers to that break masses of tests at once were performed to keep tests independent of each other and quicker to write but did indeed provide poor feedback. With modern automation frameworks for acceptance tests there is no need to write tests this way, or use the UI very much at all in test automation.
InfoQ: Can you give examples of using production metrics in testing? What can be the benefit?
Evans: Testing is an act of gathering evidence that helps us hypothesise about the future. Past performance is not necessarily an indicator of future returns, as they say, but real metrics from production are as valid a source of that evidence as any test data. This strategy brings most benefit where the cost of accurately replicating a production environment in terms of hardware, network connections, data volumes and so on might be prohibitive. The down-side is that you can only look in the rear-view mirror, so it is less useful for predicting behaviour in extremes, or understanding where your system limits lie, if it hasn’t already been pushed to those limits.
Also, monitoring production metrics gives you a chance to get a deeper understanding of the relationships between different metrics. I recall a case where the performance of a site temporarily went below a stated threshold, which prompted calls for performance improvement changes, but when all the stats from the same period were analysed, there was not enough evidence that the slower performance had a negative impact on user traffic and sales, so the expensive performance improvement was abandoned. If the decision had been based on performance test data alone, they might have incurred unnecessary cost and over-optimisation.
Roden: Several teams I have worked with recently have been using monitoring of production metrics to inform further testing to invest in. This is rather than try to stand up an environment that has the full complement of components and systems at a scale that is production like. On one occasion, a day or two after a release, a memory threshold was breached and it could be seen that the memory was much higher than it had been before. The speed of monitoring response allowed the team to run a bunch of new tests to analyse the specific component and put in a fix without having to back-out the whole release or let production go pop. Using a suite of key monitoring measures can allow a speed of response that makes it more cost effective than the cost of kit, people and time to performing heavy duty testing before each release.
Another example is A-B testing which, although not new, has grown increasingly popular in the ‘lean start-up’ era. Segmenting different parts of your user base and product provides the platform to do small tests and gauge user behaviour in controlled circumstances. This is far more rapid and cost effective than organising and executing alpha or beta testing on different product versions. It also removes the doubt about whether the behaviour in the lab is a true simulation of behaviour in the wild.
Adzic: Many cross-cutting concerns aren’t tested for an exact outcome. For example, aspects such as the speed of the home page of a web site loading, or the time it takes to process a credit card transaction. They are tested for being good enough. Even if they fall slightly outside the expected interval for a short period of time, that’s not a huge issue. With short iterative delivery, teams have a chance to significantly reduce the cost of de-risking such aspects.
With big-batch releases, testing homepage load would typically involve deploying the application on production-like hardware and getting performance testing experts to hammer it. Both the production-like hardware and performance testing experts tend to be rare resources, shared across many teams, so they create a huge bottleneck in delivery.
With frequent iterative delivery, the risk of those things breaking badly is not so high. They tend to deteriorate over time. So instead of waiting on a scarce resource and introducing a bottleneck in delivery, teams have an option of pushing small changes to a real environment and monitoring the differences, planning to adjust in the next iteration if needed.
InfoQ: Duplication in test code can causes maintenance problems with test suites. Do you have suggestions how you can reduce duplication, or even prevent it?
Evans: If you start with the premise that all test code is worthy of the same standards as production code, and that the team regards collective test ownership as important as collective code ownership, then the smells of duplication in test code should be quickly apparent and dealt with by the team. Designing your test infrastructure with the same care that you design your system will encourage things like a good separation of concerns, code re-use, single responsibility and so on. Re-usable utilities will evolve from one-off items as the need for them increases with more tests. Data builders for common test data set-up is a common example. Early tests might create new accounts, for example, in their set-up. As more tests need accounts set up, that functionality gets extracted into an account builder utility.
Adzic: Many teams neglect test automation code, treating it as a second-class citizen in their code base. The rationale for that is that the customers aren’t seeing it code, so it’s OK for it to be composed of quick hacks. But that’s a false premise.
For teams that practice iterative delivery supported by test automation, test code represents a huge part of their code base. For example, on MindMup, we have almost twice as much test code as we do client-facing code. It would be silly to ignore good coding and design practices there just because it’s not client facing.
Techniques for managing large code bases and reducing duplication are well known, and people just need to apply them to test automation code as well as client-facing code. Test automation shouldn’t be delegated to people who barely know how to write code, it has to be done by people who know how to design good complex software.
Roden: Test automation code needs the same maintenance as production code. It dictates the ease with which you can create and fix tests.
I remember one specific example at a place where Gojko and I worked. From an initial one or two automated methods for entering a transaction, 15 - 20 different ways soon emerged from the five teams working on the product. People writing new tests didn't know what the differences were between these entry methods and ended up selecting inappropriate methods or just writing new ones. This either left tests that were brittle or tests that weren’t covering the risks they intended to cover. Debugging tests quickly became hard work and the maintenance cost rose equally quickly.
We improved it by rationalising to a small set of template transactions, creating a library of usage examples, driving wider ownership and communicating new and changed automation patterns. We also had to refactor a bunch of tests to fit, but better pay the cost early than pay slowly more and more over time....death by a thousand cuts...
Prevent duplication through regular communication, collective ownership, sample usage patterns and re-usable, well thought out test code -- pretty much like you’d solve for production code...
InfoQ: This is the second book that has been published in the fifty quick ideas series. Can you give us a sneak preview of any upcoming books in this series?
Adzic: The next book in the pipeline is Tom and Ben’s book, on improving retrospectives.
Roden: Ben Williams and I are currently working on the third book, on Retrospectives. We cover ideas that range a bit wider, into continuous improvement in general, but the concept of a retrospective is still a nice embodiment of the commitment to improve.
Evans: We all like the “50 Quick Ideas” format as authors, and our readers seem to like it too, so we’d like to continue the series. We don’t have firm plans for the next in the series (after Retrospectives) but we are always gathering ideas on lots of different things. Some ideas that didn’t quite make it into the Tests book were around Living Documentation, so maybe we will do one on that topic sometime in the future.
About the Book Authors
Gojko Adzic is a strategic software delivery consultant who works with ambitious teams to improve the quality of their software products and processes. Gojko won the 2012 Jolt Award for the best book, was voted by peers as the most influential agile testing professional in 2011, and his blog won the UK Agile Award for the best online publication in 2010. To get in touch, write to gojko@neuri.com or visit gojko.net
David Evans is a consultant, coach and trainer specialising in the field of Agile Quality. David helps organisations with strategic process improvement and coaches teams on effective agile practice. He is regularly in demand as a conference speaker and has had several articles published in international journals. Contact David at david.evans@neuri.com or follow him on Twitter @DavidEvans66
Tom Roden is a delivery coach, consultant and quality enthusiast, helping teams and people make the improvements needed to thrive and adapt to the ever changing demands of their environment. Tom specialises in agile coaching, testing and transformation. Contact Tom at tom.roden@neuri.com or follow him on Twitter @tommroden is a delivery coach, consultant and quality enthusiast, helping teams and people make the improvements needed to thrive and adapt to the ever changing demands of their environment. Tom specialises in agile coaching, testing and transformation. Contact Tom at tom.roden@neuri.com or follow him on Twitter @tommroden