InfoQ Homepage Presentations Software Engineering – Then, Now, and Next

Software Engineering – Then, Now, and Next

View Presentation

Speed:

Download

42:33

Summary

Mary Poppendieck discusses how software engineering has been changed by the scale and speed required of digital companies in the past, now, and in the future.

Bio

Mary Poppendieck’s first job was programming the #2 Electronic Switching System at Bell Labs in 1967. She programmed minicomputers to control high-energy physics experiments at the University of Wisconsin during the 1970’s. Moving to 3M, Mary developed digital systems to control roll-goods processes. Mary is a popular writer and speaker.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Poppendieck M.: I'd like to talk about bridges, because Tom and I have loved to hike all of our lives. Here are a few of the bridges that we encountered as we were hiking over the years with our family. We like bridges because they take a difficult problem area and they help us get from one side to another. If you think of that metaphor, then think about hiking bridges as a way to get across a chasm that's stopping us from doing whatever it is we want to do. Something that we build to cross from one side to the other. A simple hiking bridge generally is context specific, it's meant for that area of hiking, we'd build usually with local materials. It's minimalist. It's whatever you can get away with safely, and age perishable. If there's a big flood or heavy wind, a lot of hiking bridges end up like this one on the bottom right, and they're closed. There are very sturdy bridges, usually built of stone or iron that have stood the test of time. There are bridges that are icons of cities. You recognize all of these cities, and you've seen these icons. The most interesting bridges, however, are things that just delight us, because they're very surprising. They're not what you would expect, but they do solve a problem.

Take a Time Capsule back to 2001

Let's take a time capsule, and let's go back to 2001 and look at some of the chasms that had to be crossed at that time. Google was struggling to search through 1.3 billion web pages. That's not all the web pages there were but that's the limit. That's as far as they could get. Amazon was struggling with a monolithic architecture. In 2015, Rob Brigham gave an interesting talk at Amazon re:Invent about what Amazon did to get over its monolithic architecture. In 2001, mobile phones looked like this, and there is no cloud. It's a primitive technology environment compared to today. We software engineers are building software for other people, maybe for IT departments, maybe to automate business processes, maybe to automate equipment, but we're not building software to help ourselves out. Software processes are largely manual, and they're really slow. Acceptance testing, manual. Source code control, largely manual. Release management, definitely manual. Hardware provisioning, absolutely. Production monitoring, all manual processes and all very time consuming and error prone.

The Unlikely Rise of Free and Open Source Software

We needed to cross this manual chasm and start getting things automated. The first thing we did was build some simple hiking bridges. The way we did it was with free and open source software. In 1985, Richard Stallman in the GNU Manifesto said, software should be free so that we can modify it and make it do what we need it to do. A few years later, a Helsinki student, Linus Torvalds, decided to write an operating system just for fun. A couple years later, a group of eight developers started enhancing their web server so that it worked better, and this was at the University of Illinois, Urbana, at the National Center for Supercomputing. By 1999, IBM had invested a billion dollars in Linux. Apache Software Foundation was founded by the same eight people that had started working in '93 on the NCSA web server. Today, Apache is used by over 40% of all websites. You wouldn't think that open source was going to be successful. There's no good reason. In fact, we really wanted to solve our own problems and we dove in and we did it.

Horizontal Scaling

The next thing we tried to do was to build sturdier software that would really tackle the biggest problems that we were facing. I talked about a couple of those problems in 2001. In 1988, there was a remarkable paper published at the ACM SIGMOD conference. What it was, was a case for redundant arrays of inexpensive disks. What it said was, if you have a lot of stuff that fails frequently, you can actually make it more reliable than one great, big thing that rarely if ever fails, because when you start really pushing something that rarely fails, it's going to eventually fail. No matter how rare that is, it's going to happen if you put billions of transactions through it. You have to use replication and fault detection and fault recovery software. If you do, you can have something that's more reliable than that great, big, reliable server that you've been counting on.

Horizontal Scaling: File System

With a file system, in spring 2000, Google's indexing software stopped working. There was a big war room because nobody knew why. A couple of new software engineers, at least new to Google, they had been working at another place, a research lab, and they wanted to do something more interesting. They wanted to do something that might give them some more challenge. They came into the war room and worked together and put their heads together. It was Jeff Dean and Sanjay Ghemawat. They eventually discovered that there was a memory problem. We had very reliable memory chips at that point in time. The fact is that the faulty hardware happened when you did this massive amount of searching continually, and it would fail. Their solution was the Google File System, where you take all of your data and you break it up into small chunks. You throw those small chunks in three different places, and you keep track of them with servers and master dictionaries. That's how you keep your file system, and lots of little independent computers that are going to fail all the time, and lots of cheap disks. That's how you keep it much more reliable than if it were on one big server.

They published a paper about how this works in 2003. It was another paper that shocked everybody. They said, that's amazing. In fact, over at Apache, they had been trying to do some web searches, web crawling across multiple servers, and they couldn't figure out how to do it until the paper came out. Then they copied what was going on in this paper, and came out with Hadoop, which became the basis for a large amount of big data. This was a very influential paper.

Horizontal Scaling: Transaction Processing

At the same time that Google was having trouble, Amazon was having trouble. It had the biggest available database server that there was. Everybody knew that in order to manage transactions, you had to have all of your transactions on a single server that had the master of what was real truth, and it had to manage the transactions, but it couldn't keep up with the transaction demand. As far as Bezos was concerned, the demand was only going to get bigger and greater as his dreams were being implemented. Something had to be done. The option of figuring out how to make it handle more transactions, they've been trying to do, and they hadn't figured it out. There was another option, and that was break the transactions apart into services, and have each service managed by a separate team. This is Conway's Law at work. Conway's Law says your system architecture and your organizational architecture are going to match, so bite the bullet and make it happen.

What they did was they organized into what they called two-pizza teams. That's a team the size that can be fed with two pizzas for lunch, maybe eight people or so. Each team was responsible for a service. The theory was don't constrain these teams, they need to be independent agents that can make their own decisions without relying on dependencies, on other people. That's the magic of this. In fact, eventually it took them six years to actually get this going. It led to Amazon Web Services introduced in 2006, because they decided all their hard work they would sell. Today, Amazon Web Services is a $45 billion a year business.

Extreme Programming: Test-First Development

Continuing on, the next thing we decided to do was make our lives easier. Extreme programming is a pretty good description of the kinds of things we figured we needed to do to make our life easier. They're well described in this book, "Extreme Programming," published by Kent Beck in 1999. First was test-first development. Any program feature without an automated test just doesn't exist. Programmers write their unit tests so that their confidence in the operation of the program can be compared to the program. The unit test is part of the program, always runs when the program runs. Customers write the functional tests or acceptance tests, so that they can be confident in the operation of the program, that the program does what they need. Customers are somebody who's going to use the system when it's in production. If you take a look, one of the most interesting open source pieces of software in the year 2000, was JUnit. Kent Beck was behind that. What it did was it created a unit testing framework for developers, and many unit testing frameworks were developed after that, and they were very successful. On the other hand, testing frameworks that did the acceptance or functional testing, they were much harder to come by and they did not work as well for quite a while, almost a decade.

Continuous Integration

Then there was continuous integration. Code, he said, should be integrated and tested after a few hours, day of development at most. You integrate one set of changes at a time, because then if it fails, and tests don't pass 100%, you back that out, and the team that wrote it, they have to fix it, and you throw that away. Because a test should always be left running at 100%. One of the first open source pieces of software to do that was CruiseControl by ThoughtWorks. It was an amazing piece of code at the time, and we all thought it was quite remarkable.

Test-first development and continuous integration are really easy to talk about but they're really hard to do. It took a decade to figure out how to do really good acceptance testing. The key was that you had to have well factored code. You could only test a module and its input stream and its output stream, and if you had very well factored code that would work to test your entire system. Having massive acceptance tests that were working in monoliths has been always a very tricky problem. I don't think it's been solved. Which is one of the reasons why you need to factor your code.

Refactoring

Refactoring, that is fixing your software up and making it cleaner and simpler, was introduced by Kent Beck in 1999 as a concept that after you've added a feature, you ask, can you see how to make the program simpler, while still running all of the tests? If you can do that, you do. Then you keep running all of the tests. Refactoring is essential, because your code will only get messier if you don't. Eclipse is a software tool, an open source tool, an IDE that people would use to do automatic refactoring once they had a factoring that they wanted to change on the code.

Infrastructure as Code

The next thing we did was we built bridges to support infrastructure. Infrastructure as code is perhaps one of the most important improvements in how we develop the software that came about in the 2000s. From 2000 to 2010, we went from this, which is real high friction. I can remember working with software racks. It looked something like this, to virtualized servers. The big companies like Yahoo, and Google in Silicon Valley, figured out how to use virtualized servers to allow themselves to manage their data centers with fewer people using software. Much less friction, but not enough less. Then came containers, going from virtualized servers to containers, where you have a layer here, probably Docker, which allows you to put just three apps on a single server. You can have much more effective use of the servers that you own. Much lower friction, but you can go further, there is serverless functions. These are self-provisioning, event-driven functions, where you don't even think about hardware at all. These are faster. They're simpler. They're lower cost, very low friction. Then there's edge computing, where you have the internet of things out there. They have their own smarts, their own ability to manage whatever they need to do to do, for example, artificial intelligence, or virtualization, or something like that. They have local computation, which reduces latency, increases reliability, and again, lowers friction.

Continuous Delivery

Because we recognize that we wanted to have flow, and flow in this context meant continuous delivery. In 2010, the book, "Continuous Delivery" was published by Jez Humble and Dave Farley. Over time, starting in 2010, by that time, we had figured out how to do automated acceptance testing, we started developing continuous integration, continuous delivery pipelines. Kubernetes is a toolbox that was developed in order to manage that pipeline, and build it and make it work. The concept of continuous delivery and continuous integration, leading to continuous delivery, really requires that we have all of those testing and continuous integration and refactoring mechanisms, automated well, before we can do this. That really requires an architecture that lets us do really good automated acceptance testing. When you get to that point, where you have an architecture that supports this kind of an environment, then it really pays to invest in automation on the integration and the delivery, and the testing.

Low Dependency Architectures

We found that we couldn't do that unless we had low dependency architectures. That's what happened. I want to point out these bridges on the left, the first one is near Edinburgh. It's a bridge over the major river there, major body of water, which had only one bridge, which was beginning to fail. They realized if that bridge failed, they were going to have their country pretty much shut down or hampered economically for years. They built a second bridge while the first bridge was operating fine, so that when it did need repairs, they would have an alternative. Nearby, it's a canal bridge. The canal is up high, and then they bring the canal in a boat of water down to the next level. They allow the canal boats to continue on down the canal. Interesting bridge for boats. Those are a couple of really awesome bridges. When you look at how we built low dependency architecture, what we built was pretty awesome too.

The Problem with Databases

The reason we had to do this is because databases. This is a picture of the way enterprise architecture used to be thought of, and the most important piece was a single system of record. There's a problem, and that is all of the application tended to use a single database. What you have there is your database is a dependency generator, because if one application needs a single change in the database, you have to check all the applications to make sure that change hasn't impacted any of them. You have in your single system of record a massive dependency generator, and we needed to get rid of that. We adapted pretty much a federated architecture, I'll call that a smartphone architecture, where you have independent apps that really don't interact with each other except possibly through the underlying platform or through the cloud. A microservices architecture is relatively similar to that. Each microservice has its own application, its own data, and its own business logic, and they communicate through the underlying platform or through the cloud.

What you have here, and what was heresy when it was first introduced by Amazon in 2006, was the concept of not a single database, but distributed datastores. At first, when I heard that I thought they were crazy. They knew something I didn't know, and that is the database was a thing that had to go, because it was a dependency generator. What this architecture does is it enables all of those other things, including continuous integration, and including automated acceptance testing. It enables you to have business agility.

Putting It All Together

Let's see how that works. Let's go back to Rob Brigham, and his talk at AWS re:Invent in 2015. He called it DevOps at Amazon. He defined DevOps as you take a look at the delivery pipeline, what you want to do. You have developers, they build something. They test it. They release it. It goes to customers. You monitor their use. You plan for some improvements. That's the feedback loop. What DevOps is, defined by Rob, is any efficiencies that will speed up this life cycle. If that's what DevOps is, think about this, the speed of completing that loop determines your business agility. If you want to be agile, from a business perspective, you need to take that loop and make it as fast as you can.

In 2001, this is what they had, they had one great, big monolith. They had lots of developers. They had teams that did the build and did the test and did the release. As he said, they were embarrassingly slow. You don't just have build, test, release, you also have to have your feedback cycle, your monitor, and your planning for the next release. As you can imagine, their releases were pretty far apart. What they moved to, and it took them five or six years to even begin to get there, was a small development team, a small service, and they had their own deployment and their own loop, build, test, release, monitor, plan, and keep that loop going. They could release whenever they wanted. The complete loop was really fast. In fact, by 2014, they counted and they were doing 50 million deployments in a year. Very low friction.

How Did Amazon Do It?

If you think about the movement, and it took a long time, you've got to wonder, how did they do it? Especially, how does Amazon Web Services, even to this day, release enterprise scale software, multiple services a year, every single year since 2012? They manage to do it on a really amazingly tight schedule. They manage to do stuff that works really well, at a very high level of not only reliability, but also what customers really want. The best way to figure out how they actually think about it, is to take a look at this book, "Working Backwards," by some senior executives that were with Amazon during the time when this was all done. If you take a look at it, they discovered that two-pizza teams were not quite enough. There is a chapter in there called, Beyond Two-Pizza Teams: Single Threaded Leaders. I encourage anybody who's interested in thinking about how do you do really rapid, agile software, read this chapter.

Because what they did was a lot of experiments and data, and they said that the biggest predictor of a team's success is a leader with the appropriate skills and authority and experience to staff and manage a team whose sole focus, both the leader and the team's sole focus, is to get the job done, whatever the job is. This is a model in which you don't have coaches. You don't worry as much about process as you think about responsibility of a leader who assembles a team, and that team and leader have nothing else to do but get the job done. They have the responsibility to make it happen. In fact, Dave Limp, one of their vice presidents says the best way to fail at inventing something is make it somebody's part-time job. They're very big on focus, and on team leaders that bear the responsibility for both assembling and leading the team towards the single thing that they need to have done.

Technologies that Enable Business Agility

If you look at technologies that enable business agility, we're talking about moving from proprietary software to open source. From vertical scaling to horizontal scaling. From hardware provisioning to on-demand infrastructure. From testing at the end to test as specification. From periodic releases to continuous delivery. From monolithic architectures to federated architectures. From the database as integrator to APIs as integrators. From product owners who are not responsible for any of the above, to single-threaded leaders who are responsible for all of the above.

Lessons Learned from Bridges

Let's take a quick summary and find out what we've learned from bridges. If you have a chasm to cross, the first thing you want to do is start with a hiking bridge: context specific, minimalist, experimental. Try things, see what works. You don't have to build something that's permanent and going to be there forever. You do not want to copy somebody else's bridge, because their bridge crosses their chasm and their chasm is not your chasm. If their bridge crosses their chasm, you may as well just use their bridge. If you have a different chasm, then build your own bridge from your own local materials to solve your own problems. Expect that this is not going to be a silver bullet, something that happens overnight that is really simple. Because if your chasm is easy to cross, it's not a chasm. You don't have to figure out how to solve a lot of problems. The other thing is, think about the future. Not about tomorrow, but about some years out. If you look at the really amazing bridges that were built in our software engineering world in the last 20 years, they were by companies that were thinking far out into the future, not about how to get over today's problems, but how to get over the problems long-term. The best way to participate in the future is to invent it.

What Should We Learn Next?

Then, what are we supposed to learn next? What's coming? Let me talk about this bridge in Iceland here. It goes across a big valley. The valley is in a place where the interior of Iceland has a lot of ice, and volcanoes, and every so often a volcano will go off, and it will melt the ice in this whole caldera. That ice will spill down into this valley and it will wipe out everything in the valley. Icelanders have built bridges across this valley for decades, and it gets wiped out after a decade. Then they build a bigger and more sturdy one. This was the last big sturdy one that they had built. This ice storm wiped it out. There you have it.

Did they build another big, more sturdy one? They said no, maybe a better approach is to build one just like this again, and recognize it's going to get wiped out, and just stockpile the replacements so that we can build it again, and it'll last another 10 years. They quit trying to outsmart the ice and decided that they were going to have to learn how to live with it. You need to know the limits of your bridge and honor them. We should be thinking about the limits of our tools, of our software, of the stuff that we make. What are the limits of this thing we call artificial intelligence? It's not really actually as intelligent as you would think. Do we really have intelligence or do we have stuff that's been taught the same biases as the data that it's taught at? What are we going to do about that? What are these limits of so-called social media? It's not quite as social as we thought, if it's spawning all kinds of difficulties of people dealing with each other socially. We should think about the limits of our technology, the bridges we have built, and figure out how to make sure that those limits are not going to be something we regret having built.

Questions and Answers

Reisz: I loved your concluding point, know the limit of the software you produce. As you were talking about that, with the stuff going on with AI, social, power consumption, it's almost like that's another phase, another stage right now that we're undergoing. We thought about, could we do this, but we never really thought about, should we do this? I think we're finally starting to think about that. Is this another stage, or is this just something that should have been present all along?

Poppendieck M.: As things have been built over the past, because I've been doing this for like 5 decades or something, and you look back 5, or 10, or 15, or 20 years ago, you always look back and say, maybe we should have thought differently 10 or 20 years ago, but hindsight doesn't do you any good. It doesn't matter if we should have thought about something earlier, what matters is it's time to start thinking about where are we at now? What are we going to do? What responsibility are we going to take for our work and our engineering? The big thing I'm thinking about these days, is the people who are software engineers, and building bridges across chasms, they should not be doing what somebody else has put on a piece of paper and drawn up for them to build. They should be figuring out how to solve the problem of people getting across that chasm. They're going to be successful when the people that see the bridge say, that's really awesome. They find it easy to use, and they find it safe, and they find it fascinating. We need to be chartering our software engineers to figure out how to solve the problems of the people that need to get across on the bridge, and be responsible for designing the best bridge for that era, for that problem. Make it the responsibility of the people on the team to figure out what to do. You see that in the concept of a single-threaded leader. I've seen way too much of, we're not responsible. It is somebody else tells us what to do. They're responsible. We just do what we're told. That's not engineering.

Reisz: I do think that's the next great stage, the next chasm that we need to be addressing.

Poppendieck M.: The next chasm we need to cross.

Reisz: Any of the technology you've seen over the last 20 years that you're like, yes, this will never take off, and did?

Poppendieck M.: When I first saw that Amazon was splitting up big databases into little tiny ones, I couldn't believe it. I thought they were crazy. That's one that I really didn't think would take off. I was pretty amazed that it did. I continue to be amazed that open source actually works. It doesn't respond to any of the market pressures that we have, it responds to more, what it is that we're about as engineers and what makes us proud, and what makes us challenged. It's successful because it supplies us with the stuff we want as engineers. We want to be able to do good work. We want to be able to solve our problems. We want to be proud of what we do. We would have a whole lot less time finding people to do software engineering, if that was the job. We should think about why is open source continually always so attractive.

I was a leader at 3M. My teams were mostly volunteers. They were awesome. If you are in the leadership position, one of the things to think about is, how can you treat people as if they're volunteers? Because as Drucker says, knowledge workers actually are volunteers. Every day they come in and volunteer their knowledge. We need to treat people that do software engineering as intelligent, competent people, and make them responsible for solving problems and building bridges.

Reisz: Daniel Pink's book "Drive" talks about autonomy, purpose, and mastery. I think that those three things can be summed up when you're on an open source project pretty well.

You made a really good point about databases being dependency generators. Today with our architectures, what is the equivalent of the dependency generator from the database?

Poppendieck M.: If you're not careful with a pipeline that's taking all those transactions that used to be in one thing, it can be too tightly coupled. I think, in the end, tight coupling is a massive problem that isn't totally solved with APIs. You just get too many of them and they're all over the place, but how do you coalesce them into something that makes sense? If you take a look at SpaceX and the rockets up, and they just changed the whole field of space engineering. They don't have lots of tiny little teams with no coordinating function. They have little teams with coordinating functions, and they're coordinated by the subcomponents of the architecture of the spaceship. We need to not just have APIs and lots of stuff laying around, we need to have enough structure so that we can get big things done, and still have independence on teams. That's an interesting architecture. It's been solved in hardware engineering for a long time. If you look at how aircraft or spacecraft, great, big things get done by independent teams, you can learn a lot about how we could do the same thing.

Reisz: You've seen the evolution of programming languages over your career. What do you think about them? Where do you think we're going? What do you think the direction is? What do you think of the ones we're using now? What are your thoughts on our language evolution today?

Poppendieck T.: The language gives you a vocabulary. Vocabulary is how you think. Our thinking has changed so our language has to change. Contemporary languages let you think things that you could not think before. My personal experience was the transition from sequential to object-oriented programming that morphed into the new language. Several, many new languages came in with that. That morphed into the microservices world over the next decade. Again, you need new languages, because you have new problems to solve, and new things to think about. Where it goes next? I'm not sure. What we're going to have to think about next is probably not obvious yet. The biggest thing that I think we have to think about is not a language that focuses on process, but a language that focuses on customer outcomes, on their targets and what they're trying to accomplish. I'm not sure we need a new computer language for that, but we need a new mentality that doesn't focus on process but focuses on understanding and providing and improving the experience that people have when they interact with what we do, no matter what language we're using.

Poppendieck M.: One of the ways people have tried to solve that problem, how do we focus on outcomes, is by what they call languages that can program themselves, that don't require programmers. I think that's like a pipe dream. I've seen languages evolve to be more complex and more capable, but they always reach a limit where they're too complex for just simple thinking. Then they make a mess. If you look at the evolution of databases, we went from first, second, third, fourth, we were up to fourth generation, and they were supposed to be no need for programmers. Then we stepped back. We went back to third generation. Why? Because those languages that theoretically didn't need any programming, actually did need a lot of capability to make sure that they were well organized, well tested, and well factored. Languages that are going to automatically do stuff with no guidance from people, I think are not probably the way to think about it. The way to think about it is like Tom said, languages are ways for us to express our problems and think differently about them.

Reisz: At higher levels of abstraction. Grady Booch talks about this.

Poppendieck M.: At any level of abstraction, you will always end up with tough problems. They're just going to be higher levels of abstraction with tough problems.

See more presentations with transcripts

Recorded at:

Apr 14, 2022

Mary Poppendieck

InfoQ Software Architects' Newsletter