Microservices in the Real World
Alexander Heusingfeld gave a talk titled when microservices meet real-world projects at the GOTO Berlin 2015 conference. InfoQ did an interview with Alexander Heusingfeld and his colleague Tammo van Lessen about getting people from operations involved in architecture and dealing with “us vs. them” behavior when applying DevOps, what the Self-Contained Systems approach is and how it can be used to modernize software systems, similarities and differences between the Self-Contained Systems approach and microservices, improving deployment pipelines and using measurements in deployment, and about his experiences with a "getting out of your comfort zone" program.
InfoQ: How do you sell architecture to employees who are involved in keeping system operational. What's in it for them?
Heusingfeld: In the end a system architecture will only come to life if the people who are running the daily business buy into it. That's why one of my first steps in a new project is that I try to talk to people from operations and business departments and listen to them carefully. They usually have a lot of experience to share about their company, its processes and the impact on daily business. Such insights are invaluable when discussing new features, process changes or optimization of the system. When people feel understood and realize I did in fact take their concerns into account, I don't really have to sell my ideas.
van Lessen: Basically it is a matter of trust. I often had to deal with people from operations that were very reluctant to new ideas or technologies. I assume this is because they were afraid of having to maintain and support the new system, even though they might not fully understand either the implications from an operational perspective nor why these technologies are important for the project. A key to success is to include them early on in a project’s architecture phase. This leads to a joint team with shared ideas, it creates trust and understanding for both, development and operational aspects within the whole team.
InfoQ: Deploying DevOps can be challenging, as people from different departments will become team members. Do you have suggestions on how you can deal with "us vs. them" behavior when you want people to work together closely?
van Lessen: DevOps should actually be the first step to overcome "us vs. them", as it means to staff a cross-functional team. I strongly believe DevOps teams are way less challenging to deploy than establishing a trench with cumbersome processes to regulate how one team can throw things over the fence to the other team. What is challenging however is to transform a company that runs (and likes) the latter into the former. In many cases, it appears difficult for people to give up old habits and to go with the new approach.
Heusingfeld: Quite often we noticed when cross-functional teams were put together from different departments, people would bring along their trench wars. Especially if the former department structure was kept. That's why in my point of view the highest priority of all team members must be to reach the team's goals. It must not happen that a team member gets higher priority or conflicting tasks from their department line manager. In this situation he or she will be forced to work overtime or let the other team members down. If this happens too often, it will lead to a loss of trust among the team members. As we know, if we don't trust somebody, we'll not rely on them in critical situations. In concrete terms, this means team members who don't trust each other, won't be able to reach full efficiency and are likely to struggle when they are under pressure. That's why we recommend to invest in team building when the team is assembled to establish trust among all team members.
InfoQ: You talked about the Self-Contained Systems approach at GOTO Berlin. Can you briefly explain what it is?
van lessen: Self-Contained Systems (SCS) describe an architectural approach to build software systems, e.g. to split monoliths into multiple functionally separated, yet largely autonomous web applications. The key point is that an SCS should be responsible for its own UI as well as its own data store. The system’s boundaries exhibit a vertical split along what in Domain Driven Design (DDD) is called “bounded contexts”. The integration of each SCS into the overall application happens in most cases within the browser via links and transclusion. These systems don’t share a common UI code nor common business logic. Each system may be maintained by a separate team using their very own preferred technologies. When done right, end users will fluently move between systems, crossing application borders simply by clicking links or hitting buttons – ideally without even noticing that they left one system and entered another. Using progressive enhancement, this approach supports all varieties of browsers, from old to modern, from screen readers to mobile phones. Some just show a link, others replace the link to show a nice, visually enriched view of the contents provided by a different system.
Heusingfeld: I noticed that people involved with operations in particular regard this Self-Contained Systems (SCS) approach as quite a reasonable way to build or modernize systems. In my opinion SCS help to avoid overwhelming people by containing complexity. This is an essential aspect to me because, as previously mentioned, the people who run the daily business need to understand how and why you want to change things so they can support you. We’ve collected a lot more information on the SCS approach so everyone interested should take a look at http://scs-architecture.org/ and get into the discussion.
InfoQ: Do you have examples showing how the Self-Contained Systems approach has been used to modernize software systems?
Heusingfeld: Two concrete public examples where the SCS approach has been applied are otto.de (see Otto TechBlog in German) and Galeria Kaufhof. More references and articles are listed at scs-architecture.org/. Besides case stories, an interesting aspect is regarding how to split a monolith into an SCS architecture in-flight. As mentioned in the talk, we have collected and published some typical improvement approaches which we have successfully used to modernize systems under the name architecture improvement method - aim42. Especially the “Strangulate bad parts” and “Change by extraction” strategies seem like a natural fit to be implemented with SCS. In both scenarios you identify functionality within your monolithic application which should be moved into a new system. SCSs answer the question of how to split your old system: Identify the business domain of this functionality and put it into an SCS which only contains functional code serving the same domain, i.e. similar use-cases. If you cannot clearly separate the business domains within your monolith, yet, the SCS approach does provide the freedom to take coarse-grained steps e.g. from one monolith to two SCS in the first place and refine them in small iterations. This is even a dedicated improvement approach in aim42 called “change via split”.
InfoQ: Can you elaborate about the similarities and differences between the Self-Contained Systems approach and microservices?
van Lessen: SCS and microservices share a lot of common concepts, e.g. flexibility and diversity in technology choices, the alignment with organisational or architectural boundaries and the isolation of functionality through independently deployable units. However, SCS are decidedly more coarse-grained. In fact, SCS applications might be composed of microservices internally. SCS are intended to be integrated at the UI layer whereas microservices are more likely to be composed at the logic layer of a larger application. So while both share the same ideas, SCS push the original ideas of microservices to a macro level.
InfoQ: What if people say that deployment cannot be faster because there is an established process? Any suggestions to deal with this?
Heusingfeld: Sometimes that's what you hear when there's compliance and security audits or huge amounts of manual testing involved in the delivery process. From my experience people typically reject to change the process at first as they either don’t believe the proclaimed benefits will come or they don't trust that it's worth the efforts. So instead of attempting to automate everything at once, I recommend to take smaller steps and start to automate what's closest to you first. There usually won't be any compliance issues with setting up a server for the development team in order to automate software deployments, e.g. via a CI system like Jenkins. This installation can be used for automated acceptance testing which will provide fast feedback to the team whether they build the right thing or if they break something. Once the team can demonstrate increases in speed and quality, their manager has a “success message” to share on management level. This way other teams will recognize what the team did and quite likely adopt it if they find benefits in it for themselves. So even if you have manual actions in the delivery process, you could continuously improve and automate everything step by step before and maybe even after these actions - just like an assembly line in a factory.
van Lessen: But still, sometimes it is just impossible to change such rules. I have experienced this in large companies where IT is not a first-class citizen but rather just a means to an end. The rigid processes are established for their core business and are simply extended to IT as well. It is unlikely that IT has the power to positively disrupt those company-wide rules. Anyways, that still leaves opportunities to automate as much as possible up to the border imposed by processes.
InfoQ: Can you explain how transparent deployment pipelines can be used to explore and improve software delivery?
Heusingfeld: We had a customer who wanted to "do automation later" due to “regulations and compliance” just as in the example I mentioned before. They also measured the success of software features by KPIs (key performance indicators). So the team never knew whether they built the right thing until the software was deployed to production and the metrics for the KPIs were taken. We decided to check how these metrics were collected and found that we should be able to derive those metrics even before production. We ended up not only collecting the metrics from the integration testing system and during manual test phases but by modifying and adding to the automated acceptance tests, we were able to get some first results during early acceptance testing on the development servers. That way developers already had an idea how they did approximately 5-15 minutes after they pushed their code to the remote Git repository. This insight enabled a new transparency level across the whole delivery pipeline and build trust in the delivery and the work of the team - especially on the product owner’s side.
InfoQ: Do you have examples of things that can be measured in the deployment pipeline? What would be the value of these measurement for developers, how can they use them?
van Lessen: In my opinion, gathering runtime metrics from applications and systems of systems is not being taken seriously enough. Although it needs some additional effort it is such a powerful enabler for a more focused development. We basically distinguish three kinds of metrics: Business metrics that answer questions typically asked by C-level executives, Application metrics that capture information about the health and performance of the application itself and System metrics that gather information about hardware and the operating system. Being able to overlay all this information allows for gaining a deep understanding of how the application behaves at any point in time. This is most desireable in production environments from an operations perspective - but, as Alex already said, it also provides insights to developers in order to assess the quality of their implementations over time. Let me give a concrete example: Users complain that report creation takes too long. Usually you don't get any information on what "too long" means. Production metrics will give an immediate answer to that, the same metrics on a test run on the CI system help developers assess their fixes. This may include a variety of metrics. One business metric might be “how many reports do we create per day”, application metrics could answer the question, whether it is the database access or the PDF creation that takes more time. When the CPU is still idle (system metric) but the report creation is still slow, it might be a concurrency issue, e.g. with empty and blocking connection pools. Having access to the same metrics in the deployment pipeline answers questions like "does the new code work better?" much earlier, ideally right after a developer pushed their code, and makes it easier to find the right way to approach a certain problem.
Heusingfeld: We're making good use of the well-known benefit of fast feedback provided by a continuous delivery pipeline brings along and increase the variety of answers accessible to developers. We could not only state whether a code change had a negative impact on the application performance if we measure heap usage, CPU utilization or net and disk I/O. We could even take it a step further. Usually the product owner measures some KPIs in the production environment to check how well the product is doing. This might be things like "Duration of a checkout process" or "Number of newly registered customers in the last 24h". We could ask our product owner which metrics are used to calculate their KPIs and try to measure these in the environments we run automated tests on. This will most often not work for all production KPIs but for more than one might expect.
InfoQ: You talked about a "getting out of your comfort zone" program at a company that you worked with. Can you share your experience, what did you learn?
Heusingfeld: The company had an employee program where each employee was asked to spend a day working in another department than he is usually working in. This should be done at least once every three months. An employee working in controlling could find themselves in the data center, a marketing person in the warehouse or a developer working with the HR team. I myself was working in the IT department developing a Java application for the point of sale and had the pleasure to accompany a sales manager on his trip to several of our shops. Not only were I to see the software I was working on in action, I was actually asked to use it to sell products. It afforded me a whole new perspective. I now understand that just imagining being a user while testing your software is totally different from actually using it in real-life situations: It’s not only stress and rush that changes usage behaviour but also that in contrast to my office desk, a point of sale is not a protected environment as my office desk was. This experience was invaluable to me! Another day we had a logistics person coming into a development department and you could tell that people had their prejudices. Nobody knew what to do with him as “he couldn’t write code”. So they decided to show him around, show him how the software was made and the nice little graphics explaining our delivery process. Suddenly the guy asked “sorry, but if I understand correctly, you’re losing time if you really do it like this”. Everyone was stunned and asked him to explain. Within about 2 hours he described his perspective on things and managed to highlight an inefficiency in our process that we ourselves hadn’t spotted for months - we’d simply taken it as a given. If there were two things I learned that day, it is "Never let your prejudice succeed" and “Always consider a different perspective!”
About the Interviewees
Alexander Heusingfeld is a senior consultant for software architecture and engineering at innoQ in Germany. As a consultant, software architect and developer he supports customers with his long-term knowings of Java and JVM-based systems. Most often he is involved in the design, evaluation and implementation of architectures for enterprise application integration (EAI), modern web-applications and microservices. He loves to contribute to OpenSource projects, speaks at IT conferences and Java User Groups and occasionally blogs at http://goldstift.de.
Tammo van Lessen is a Principal Consultant for software architecture and engineering with innoQ in Germany. He is an elected member of the Apache Software Foundation and PMC chair of Apache ODE. He co-authored a German book on WS-BPEL and was a member of OMG’s BPMN 2.0 Finalization Task Force. He has a weakness for rightsized software architectures, DevOps and modern monitoring tooling and published several academic and non-academic articles on Web services & business process execution. He is a regular speaker on national and international conferences.