Key Takeaways
- We need to assess the value of testing, and that assessment is the process of observing people, talking to people, and essentially testing the test process.
- A test case is not a unit of measurement.
- The graph doesn’t tell us what the graph means: we have to figure out what the graph means. And you can’t know that from outside the project.
- This is a problem with automation, because automation looks at very specific things in the specific ways. Humans have the ability to look at broader things in broader ways. Humans can even look at the things we are not even conscious that we’re looking for.
- Defocusing is the solution to the pesticide paradox. Defocusing means to continuously change your tests, your techniques, and even your test strategy.
In this interview, James Bach explores making software testing legible and how to assess the values of your testing work and risk in a software product. He talks about how to overcome the testing automation pesticide paradox, and how should we leverage AI and ML in our testing. With more than 30 years software testing experience, James gives three pieces of advice to software testing beginners.
Xiaoqian Geng: Thank you for accepting my interview invitation. You travel around the world offering the software testing services to software companies and giving talks at conferences. Could you please share with us why you quit your job at 1999 to become an independent consultant? I know many experts in this area want to pursue a similar career, and what’s your suggestion to them?
Bach: I wanted to pursuit my craft as software tester without compromise. That’s why I quit. I was very angry at that time, because I had been asked to create fake and fraudulent test documentation for a big client of the consultant company I worked for. When I refused to do that, the client was upset, and my boss was upset that I wouldn’t try to please this paying customer of ours. But you know, there people who give you whatever you pay for—they are called drug dealers. I want to be more like a doctor, not a drug dealer. A doctor provides honorable service; a doctor does not just give you any drug you ask for or tell you any lie you wish to hear.
So that I decided I to start my own consulting company. This can be difficult, because it turns out there are not a lot of people out there who want to pay for work that you think is good for them. That’s why many consulting companies are, in my opinion, faking it. They have to pursuit the money to stay in the business. That’s not the same as pursuing excellence.
Another way I can explain it is that I am really a difficult man— in the long term I can only work for someone who completely understands me, and that’s my wife, Lenore. She knows how to maintain the environment I need to be creative. She would never expect me to work with a client that is not a good fit. So, when I started Satisfice, Inc., I almost immediately gave it to her so that she would feel as invested in it as I feel. It’s worked very well for us for the last 18 years. She does all the back-office work and I do all the consulting work.
Geng: If there are some experts who want to pursuit the same job like you, what’s your suggestion to them?
Bach: Well, there are artists, philosopher, and technologies all over the world, who are independent and are willing to make sacrifices in order to pursuit their craft without big compromises. But it is very hard life, it’s hard to make living if that’s the way you are. So it’s not exactly something I would recommend to people. In the corporate world, you must get along with people who have the money. And the people with the money and power often acquire it by making sacrifices and compromises to make their bosses feel better. They tend to expect that the people they hire will do the same thing for them.
This is especially about testing services, because when people with money think testers, they usually don’t want people who make trouble. They want testers who are quiet and or simply tell them that the product is good enough. They want good news. It’s like this: if they don’t have any testing, they get in trouble for being irresponsible. If they have sincere and hard-hitting testing experts, they get in trouble because the testers keep finding problems. But if these executives hire fake testers, who are ISTQB-certified and make a lot of documentation but don’t actually find many problems in the product, then they can blame those testers when the product is bad, and no one will blame the executives who hired them. I think this is why so many testing consulting companies exist in the world. There are lots of people willing to be paid to shuffle papers and look busy, as long as they are quiet and polite.
I am saying my kind of work is hard because I look for clients who want to be told the truth, even when the truth is harsh and inconvenient. My clients are people who do not want to be fooled or flattered. The hardest thing is finding these clients.
Geng: But you are very famous and you have very good reputation.
Bach: Well google me. you’ll see I have reputation with arguing with people; for being angry because I called people out for what I thought of as bad behavior. Some people call me a bully. I don’t think standing up for what is right is bullying, and I think that good people have a duty to stand up. But then again, perhaps none of the people who think I’m a bully believe that what I stand for is right. I suppose they think the things I consider vitally important to good engineering are just my own arbitrary obsessions.
How to make testing work legible to people who are non-testers?
Geng: You say that for testing, different people have different perspectives, and have different expectations in the short and long term. From the video published by PNSQC in Oct, 2017, you mentioned a very key point which bothers many test engineers is “if we want to save our job and kind, we need to make our work legible to people who are non-testers”, could you please summarize the answer on what do you think it’s the best way to measure the testing values?
Bach: I don’t like the word “measure.” It’s not a helpful word. Any time you want to say “measure,” I suggest you use the word “assess.” Assessment is a broader concept than measurement. Assessment includes measurement, but also other things people don’t normally think of these as measurement.
We need to assess the value of testing, and that assessment is the process of observing people, talking to people, and essentially testing the test process. We need to help our clients understand our own testing and why it is valuable. That’s where the word “legibility” comes in. Legibility means the ability for something to be read. Handwriting is an obvious example of something that we speak of as being legible or illegible. But you can apply the concept of legibility is more than just handwriting. You can apply it to any process or system. A system is legible if you can look at it and tell what it going on with it. After 27 years of marriage, my wife’s moods are highly legible to me. I can tell in a few seconds how she is feeling.
Unfortunately, testing is often not so easy to read as handwriting or people. That’s why testers must work to make their testing legible. They do this by using whiteboards or spreadsheets to make helpful displays. They also do this by learning to tell a testing story. When testers don’t know how to tell a testing story, the typical method to make testing seem legible is to count test cases or to point to story cards and say “I tested that.” But I think that is a horrible idea. Test cases are nothing but files. Counting them tells me nothing. Unless you know what inside those containers, you don’t know that five hundred test cases are good. Maybe it is just one test case copied 499 times.
Instead of test cases, I break my testing out to different activities and I give a name to each activity. I might speak of service testing, sanity testing, performance testing, functional testing or variations of those things. I’ll give name to each test activity. The name of each activity is descriptive. Someone who is a non-tester could have a chance to understand what I am talking about. If I need to explain what happened in any given activity, I speak of coverage (what kinds of things did I look at, including what data I used in my testing), oracles (how did I recognize bugs when I saw them), and procedures (what specific experiments did I perform on the software during the activity). I am also ready to talk about what specifically motivated that activity—the suspected product risk.
Geng: How do you know what the risk is in a certain software application?
Bach: Once again, it’s a process of assessment rather than measurement. The way we get the idea on what the risk is through a combination of different kinds of analysis.
We can do black box analysis: that means we look at what the product does and the role it plays in the user’s world. We ask ourselves if the product behaves badly, how might that impact the users? You know, when twitter first came out, it was just silly little platform where people sent useless little message to each other. If the Twitter went down, there was no risk. No one cared very much. Then, people start using Twitter for time-sensitive notifications of important events. Twitter became a system for disseminating news about disasters. These days, if Twitter goes down, it actually hurts people.
We can also open the black box and look inside the system: That means we look at how the system is coded and linked together. We imagine parts failing and trace what would happen next.
We can also look at history: we can find out what has gone wrong before with that product or similar products. Anything that has happened before may happen again.
When I say “figure out what the risk might be”, what I really mean is discuss it. We discuss the system and its potential failure as a team. We may be alone, but generally, there are people around you with different expertise and different experiences. So, the process of analyzing risk is not the process of doing mathematics. It’s not the process of calculating what the risk is, or measuring it directly on some sort of “risk meter” that we wave around like a Geiger Counter. That’s not the way how it works. Instead, it is a social process, learning and deciding what we were going to worry about.
The discussion tells us what risks we suspect are in the product. The testing tells us what the risks are actually in the product. Then, we release the product to see if we were right. And maybe we were right, or maybe there is another risk testing did not discover. Over time, by looking at any problems that escape us and get into the field, we will be better able to understand what the real risks are and to anticipate them and to test it in focused way for those things.
How to make test automation be more effective?
Geng: Test automation is a popular activity in most software companies today, but there is a very famous debate, called the “pesticide paradox”, could you please share your opinions on how we could make our automation tests more effective?
Bach: The simple idea of the “pesticide paradox” is that if you have a method of finding a particular kind of bug, then you will find those kinds of bugs and not other kinds. If there were no more of that kind of bugs left, you will not find anything. Meanwhile, any other kind of bugs will escape. You will think you have tested well because you aren’t finding bugs any more, but you would be wrong. This is a problem with automation, because automation looks at very specific things in the specific ways. Humans have the ability to look at broader things in broader ways. Humans can even look at the things we are not even conscious that we’re looking for. So, the pesticide paradox is a bigger problem with automation than it is with human testers.
But it also affects to the human testers, because human testers have biases, too. A very common example is that human testers tends to focus on boundary tests. In my teaching of testers, boundary testing is the most popular technique. The problem is: not a lot of bugs are boundary bugs. If you use only boundary testing, and you don’t find a problem, you will think that the product is good.
Therefore, a big part of teaching testers is to teach them how to defocus. Defocusing is the solution to the pesticide paradox. Defocusing means to continuously change your tests, your techniques, and even your test strategy. You can do this with tools as well.
(You know, I don’t use the term “test automation”, because I don’t believe testing can be automated. When people say testing automation, what they mean usually is automating the fact checking. And they are looking for specific things, and they’re looking for pass or fail on those specific things. That is automated checking, but it is definitely not automating what any reasonable human does to test. When I test as a human, I don’t look at specific things lonely, I’m also looking for clues. I am looking for things that are strange. And once I found something strange, I investigate it. Then, of course with that investigation I might find a new kind of bugs that I wasn’t even looking for when I started. The machines never do that. Only humans do.)
If I am working with automation, I usually randomize the test data. In that way, I make my automation refresh itself. I also use data-driven automation, so all I have to do is to change the data files from above. I don’t necessarily have to change the code that might tool is using. I don’t have to rewrite the automation. The automation could run the same code that it always runs but with different databases, or different environments.
Geng: Is that a similar concept to context driven testing?
Bach: No. That’s another thing entirely. Context-driven testing, in a nutshell, means that you throw out “best practices” of testing and instead you look at the testing problem in context and simply solve it. Testers are skilled people, and skilled people will use different practices, they will use different techniques, they will use different methods, and different tools depending on the particular situation that they are in. Instead of memorizing one right way to do testing, you gain your testing skills and you do whatever is needed. You could say context-driven testing is the philosophy of testing based on the human skill. In other words, it’s engineering. Because that is what engineering is. A skilled engineer, a trained engineer, reads a situation and solves the problem that needs to be solved in that situation.
That contrasts with the Factory school of testing. The Factory school of testing says that skill doesn’t matter very much. It’s not about skill: it’s about following the procedures. In the Factory school, they want you to write down test cases, write down test scripts, then they want you to automate it. They want to take people out of the equation, just as factories did during the industrial revolution. We in Context-Driven field, we think that is a very bad idea. First, it doesn’t even work! It leads to Pesticide Paradox. It leads to poor testing that is expensive and also dehumanizing.
The Context-Driven people are humanists.
Geng: You just mentioned factory school testing. But most of companies are still using this method, because it is easy to visualize the quantity of testing.
Bach: What’s easy to visualize the quantity of testing?
Geng: Like, how many test cases, how many test scenarios you covered.
Bach: But that is a lie! They say that is easy to visualize, but that is not visualizing the quantity of testing at all. They have no idea what the quantity of testing is. If you say you have 500 test cases, you just have this number: 500. You don’t know what these 500 test cases are. Let me put it this way. If you say you have to travel 500 miles, you could visualize that. Because the mile is a standard unit of measurement. But if you say you are going to travel 500 “segments,” nobody can visualize that. Imagine a cookie. You can take one cookie, and put it in your hand. That’s one cookie. And then you squeeze your hand and it explodes. And now you open your hands. Now there are just pieces of cookie in your hand, and you have ten thousand crumbs. It’s the same cookie before but it’s ten thousand crumbs. It’s just packaged differently. You change the packaging of the cookie but it’s still that same cookie. So it is with testing: in some packaging you may call it one case, but the value of this testing could the same as 500 test cases over here. A test case is not a unit of measurement.
Let me put it this way, I am 52 years old and one thing happens once you get older that is the companies are being run by people who are younger than you. And suddenly it makes sense why the companies are run badly: It’s because children are running companies. They are little kids, and they are running the companies now. When I was 20, I thought maybe those older people knew something I don’t know. Maybe there were some secrets they possessed which made the behavior of management sensible. Now, I have been through 30 years in this community. I know they don’t have any secret.
The only reason people still count test cases is that they either refuse to learn how to manage a testing process or they trust people who refuse to learn. They just never grew up. It’s true that when I was very young I also counted test cases. But then I grew up.
Geng: The quantity of test cases doesn’t show the effectiveness of the tests, it is one way to measure how hard the people were working. It doesn’t mean the work has value or not, which is another perspective. It is totally different. Maybe we could use the quantity of bugs the tests found to assess the value they created.
Bach: Do you use the quantity of bugs to measure the value of testing?
Geng: It’s hard to say, for example system test, you find a bug, but that bug could be very critical, very complex and very valuable.
Bach: Exactly. I used to use bug metrics. There are interesting things you can do with bug metrics, especially if lots of bugs get reported in your project. You can make graphs. You can ask some interesting questions based on the graphs you keep. But having said that, I would say the important thing was talking about the metrics; looking at these graphs and wondering what does this mean, and what should we do? That was the important part of the process. The graph didn’t tell us what the graph meant: we had to figure out what the graph meant. And you can’t know that from outside the project. So we used bug count as a way to generate questions and then we would go and investigate and get those questions answered by talking to people.
Sometimes, people will give you graphs, and those graphs are suggesting a false story. People can fake the data or shade that data when they are embarrassed about the truth. People can hide behind numbers. You need to be aware of that, and anyone who works in management and uses metrics should know this. What management should do is lower the importance given to numbers, so that people will have less incentive to lie about numbers or to hide the numbers.
Measuring of human processes is never objective. Bug reporting is a human social process, it’s not a physical system. It’s not like measuring a volcano. Imagine if a volcano is emotionally self-conscious about the ground temperature, and artificially withholds its magma where the geologist puts a thermometer. The measurements would be wrong. But that’s silly. Volcanoes don’t care. It’s humans who care about being measured.
How new technologies will impact the testing approach?
Geng: What do you think the new technologies will impact the approach to testing? For example, how AI/ML could help to increase the test efficiency and sufficiency?
Bach: The problem with using any tool is you have to understand what the tool does in order to properly use it as a tester. If you don’t understand what the tool does, then you can not rely upon it. You could still use it, but you can’t rely upon it. Here is the example of tool that I might use but I can’t rely on: a child. I can use a ten year-old child look for bugs in my product. I can have a full classroom of children look at the product and they may actually find some bugs. If they did, you wouldn’t throw those bugs away; you wouldn’t say that I can’t accept this bug report because you are a child. You would accept the bugs but you would not say that the product was properly tested, yet.
Another thing you can’t do is ask the ten year old “how did you find this bug? Explain your methods and how you might do better.” This example of child tester is exactly equivalent to an artificial intelligence tool. If you have some sort of deep learning tool and it does testing for you. You say, “OK, super tool, test this product” And it goes beep beep beep... and it says, “three bugs found.” That’s just like a child found three bugs for you.
When I talk to you as a tester, I can ask you about your techniques. Your answers to me begin to give me trust in you. You say “I am using this technique, or that technique.” I ask if you can you demo your technique, and you say: “OK, I can demonstrate it.” And then I ask you questions about your oracles or your coverage, and you can you answer my questions. The quality of your responses make me more and more comfortable that you are handling it.
I think with artificial intelligence people want a magical tool that they will trust like some imaginary friend in a kids book. I call AI “Automated Irresponsibility” and that is really what it is.
I don’t see any AI tools that are transparent. I don’t see they have earned trust. But if you want to have such tool. And you wanted to trust it, you have to test it. That testing process would be pretty extensive. And also, the scope of the tool would be pretty narrow. Because machine learning anyway is based on training data that is very specific to specific context. If something is trained in one way for one thing, it may be not good at testing a slight different kind of thing. We should ask how would this system trained, how would we know the training data was good data and unbiased. Because any bias in the data would be bias in the machine. And we have to ask ourselves, do we rely on this?
Three pieces of advice to people taking up a career in testing
Geng: Thank you. If you give three pieces of advice to new testers, what would they be?
Bach:
- Security testing is very hot now. I suggest going into cybersecurity as a testing specialty if you have the mind for all the trivia that security testers need to know.
- I would say that if you want to learn program you should absolutely learn program. If you don’t want to learn program, you can still do very well as a tester if you work for a good manager. But it’s hard to get a job these days as a tester if you are not comfortable with automation, because too many managers think automation is good testing. Personally, I think that in testing field it’s way worse off when we make everybody be coders. I am a coder. I am not against coders. The problem is that as a coder I tend to write a tool whenever I am testing. I want to write a software to help me test. This often distracts me from testing. I need people who are not as excited about code to keep me focusing on the actual testing process.
- This is my final piece of my advice: “find a mentor.” Find someone who might not necessarily work with you in the company, whom you can ask advice of, whom you can vent to. Perhaps join a testing forum.
Geng: This sounds Great! Thanks for sharing this with me. We’ve finished all the questions I have today, thank you very much for the time today and very appreciate you could share the stories, experience on testing. I believe all testers could learn from this. Thank you!
About the Interviewee
James Bach has been a tester, test manager, or test consultant for 31 years. He wrote Lessons Learned in Software Testing: A Context-Driven Approach, and Secrets of a Buccaneer-Scholar. He is the creator of the Rapid Software Testing methodology, and a fierce defender of agency or algorithm.
About the Interviewer
Xiaoqian Geng has been working in the software testing more than 12 years. She did public speech of <six evolutions of software test automation> and <the challenges of big data testing> in testing conference. She has US patent “Automated testing of gesture-based applications”. Now she is QA/test director of Splunk in San Francisco.