BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Agile Sailors - A Journey from a Monolithic Approach to Microservices

Agile Sailors - A Journey from a Monolithic Approach to Microservices

Bookmarks

Key takeaways

  • Explore what evolutionary change over a period of more than four years looks like
  • Discover why it pays off to obey conway´s law when you change your software and your organisational design
  • See how to apply leadership across different teams, areas and hierarchy levels
  • Get an example how change management relies on mindset and a consistent long term vision
  • Get an impression how much effort it needs to change from Functional Teams to Cross-Functional Teams and how much value you get

Abstract

Over the last couple of years eSailors IT solutions has implemented big technological and organisational changes: from functional silos to cross-functional teams, from a work flow that looked like an assembly line to dynamic loops, from a monolithic platform to microservices, from hierarchical command-and-control to leadership as a team sport. This article provides a summary of our journey. It leads you from where we started from about four years ago over three main stages of change to where we are right now. Each stage provides an overview of

  • our organisational set-up at the time,
  • the dominant technology stack,
  • the most important challenges we were confronted with,
  • the results we expected from our changes and what we actually achieved,
  • the lessons we learned and how this encouraged further improvement.

A main theme that meanders through this text is Conway´s hypothesis that “organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.” [1] We will use this thesis as a special kind of looking glass for our own history. What sense does it make to mirror organizational and software development? What can we learn from exploring our company´s communication patterns? Why does Conway´s law still hurt us?

Where we started from

Once upon a time in Hamburg. Life was simple for a small software company with 75 engineers and about 120+ employees in total. Ideas for around 3 Million customers were provided by clients, the marketing or the legal department. The platform on average handled about 350.000 active customers, and 250 million billings a year. The work flow was organized by project managers. Knowledge was strictly seperated in functional silos: a software department was responsible for writing features, a quality assurance department for ensuring quality and an operations department for keeping the software stable. If we needed to change any infrastructural components (e.g. adding new servers, different operating system packages) a team located in another department (and country) prepared the necessary setup. Due to a broad variety of budget, alignment and/or communication issues this approach often led to severe delays

Image 1: eSailors´ assembly line

What did our organizational set-up look like at that time?

What did our organizational set-up look like at that time? Image 1 shows that our work flow was organized pretty much like an assembly line in an old-fashioned factory. Most employees worked in one of the five departments (project management/marketing, design, development, QA, operations). Each department had their own management structure with team leads and a head. The software artefact was handed over from one department to the next one along with specific sprint cycles and steps:

  • 1st step: Features were defined by marketing, legal requirements or project managers without significant pre-validation or assessment
  • 2nd step: the design department created design based on their experiences
  • 3rd step: the development department implemented functionality within one software artefact, The teams used Scrum, but for UI changes a dedicated frontend team was consulted (3 weeks)
  • 4th step: QA department tested (2 weeks)
  • 5th step: in case of infrastructural changes, a infrastructure team had to prepare the system
  • 6th step: an ops team deployed into production and maintained the stack
  • 7th step: bugs were fixed by a dedicated team

In short, our value stream was defined by water-Scrum-fall [2]. That is, we implemented Scrum within a traditional structure and our agile mindset was focused on bits and parts rather than the whole system. Besides, we were still pretty much organized in silos that were managed in a traditional way. Decision-making was centralized, communication often a one-way-street, dominated by top-down strategies versus bottom-up reporting. In Image 2 you can see the organisational setup.

Image 2: eSailors´ Org setup at the beginning

Each product development team did its best to become more agile. Image 3 shows the setup of the teams. DEV stands for Developer, QA for Quality Assurance and PM for Product Manager in Image 3 and in the following images. Even if every team had a quality engineer, the final quality check and the approval for the software was done be a dedicated quality department. The quality engineer in the team had to prepare the test in a way that the quality department was able to execute it for the approval.

Image 3: eSailors´ Dev Team setup at beginning

What about our technology?

It doesn´t come as a surprise that this paradigm was mirrored in our software architecture. Technology was designed as one monolithic platform, building on a tightly coupled codebase, always deployed as one piece of software. About 80% of our code was bundled in only one artefact. Engineers at eSailors used to call the platform a “big ball of mud”, because it lacked a perceivable architecture and was difficult to extend and maintain. Deployment was done with a complex tool and also bound to the logic of the version control system. Sometimes it took several hours to see unit tests failing after a commit. Due to the complexity of both build and deploy processes, the tech stack was very difficult to change.

How we tried to change our course

“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure” Conway argues. What did we learn from reviewing our current state by then? How did we set both structures in motion? What did we plan for and what did we actually achieve?

One of the first lessons we drew from Conway´s Law was the following: changing to innovative products and short time-to-market cannot be done by relying on technological changes only. Instead, organizational changes and technology have to go hand in hand. Both dimensions need to be inspected and adapted accordingly. Although this sounds simple in concept, transforming our set-up as well as our mindset took us long and is still top on our menu.

Image 4: New team setup, step 1

Reviewing our assembly line problem, we decided to go for cross-functional teams and make them fully responsible for the complete system development life cycle. In order to achieve this goal, we started with our organizational set-up. We resolved the bug fixing team to make the feature teams responsible for their own mistakes. This also helped them to better understand operational requirements. Since we were convinced that this should include a full-stack responsibility, the members of the shared frontend team were distributed to the feature teams. At the same time, we changed our sprint cycles to two weeks for developing and one week for testing.

What did we expect from these organizational changes? Basically, we wanted to improve both software quality and delivery times. What did we actually accomplish? Well, our results felt a bit of ambivalent. On the one hand, empowering teams by enhancing authority as well as responsibility made people more aware of the problems. Rather than patching issues, all teams began to look for sustainable solutions. This was encouraged by the new set-up as shown in image 4.

Making the development teams responsible for bugs improved the interaction between the operation and development department in many ways. DevOps exchange programs were initiated (developer for one sprint in Ops department and vice versa). Developers voluntarily requested on-call participation and access to monitoring tools. A cultural change towards more end-to-end responsibility became more and more evident. Developers, operators and infrastructure guys started to discuss about system and management issues on a regular basis. This was a crucial catalyst for further improvement. The leaner assembly line is visualized in image 5.

Image 5: Pipeline, step 1

As expected, leadership was enhanced on many levels. Engineers who used to build on their I-shaped skillset as functional experts, started to develop themselves toward T-shaped leaders of cross-functional teams. Rather than focusing on a central architect, teams took over responsibility for their technological decisions.

On the other hand, our shared code base was left untouched and continued to cause high development costs for small improvements.

What´s more, we started to realize that our Scrum approach failed in many ways. Rather than providing the right kind of guidance, it primarily increased our overhead. Often,we ignored our sprint commitment completely because of an upcoming bug we needed to fix. Cutting big features into smaller stories and prioritizing them led to long and exhausting planning days. Perhaps the biggest impediment has been the unchallenged waterfall mindset of both project management and the marketing department as our primary client. We used the Scrum framework, but the scope of features was not adapted any more during implementation. Instead of reducing upfront documentation and using inspect and adapt on every small development step, the Product Owners in the teams acted as manager cutting waterfall project plans into smaller immutable parts. Besides, the work flow of our frontend developers was hard to manage. There were times when the rest of the team was blocked waiting for the UI changes. In other sprints there was nothing to do for them, this frustrating situation made some frontend developers leave the company.

How we set sails differently

What did we learn from our changes? Where did we fail to build on Conway´s insights? How did this encourage further improvement? Here are some of the answers we found:

  • Scrum didn’t work for all teams and in every situation. That´s why we had to let people inspect and adapt their work process to their needs while encouraging them to see the big picture of our business.
  • Consequently, we needed agile teams focused on end-to-end-value streams rather than Scrum teams that were still silo-oriented. And we needed agile coaches with a entrepreneurial, cross-department perspective rather than ScrumMasters focusing on one team only.
  • We also realised the need for a more T-shaped skillset not just for leaders but for all our engineers. Having I-experts with deep knowledge in specific areas is fine as long as they are able and willing to cover topics beyond their respective craftsmanship. Ideally, our future engineers would be both open for cross-functional collaboration as well as keen to improve software quality without waiting for explicit business requirements to do so.
  • Taking over the responsibility for bug-fixing increased our awareness for systemic problems. On the other hand, committing to business areas and related software was still pretty hard with a big shared codebase. In order to increase speed as well as agility, we felt a growing sense of urgency to reduce dependencies on the business side as well as on the technical side.
  • Having product managers in the teams who are not fully empowered to drive the product vision does not add value. We´d only make a difference if each team is fully empowered to create, inspect and adapt their features autonomously throughout the development process instead of merely executing requirements from the outside.
  • The top-down driven decision-making process did not lead to the expected results. In order to foster more innovation we had to support bottom-up processes too.
  • Although team empowerment was the right thing to do, communication and teamwork across different teams and departments remained a big issue.
  • Changing our organizational setup and creating new roles was not enough. We had to solve the conflicts between all management roles in order to establish leadership as a hierarchy-bridging as well as cross-functional team sport.

Long story short, we realized that we had to devote more attention to building the right things in the right way instead of keeping us busy. Moreover, we had to accelerate our overall processes and define clear responsibilities for this. In order to do so we set new improvement measures.

First of all, we increased the importance of UX and product management. Driven by our top management and supported by external trainers we implemented an UX process including user tests, user lab and UX engineers within the teams. In parallel, the product managers took over full responsibility for developing valuable software for our customers. In order to mitigate well-known communication barriers, the test department was made part of the engineering department. This resulted, as image 6 shows, in a new organizational set-up and work flow and image 7 shows the new team setup.

Image 6: New organizational set-up and work flow

Image 7: new team setup step 2

Not sticking to “pure” Scrum any longer, the teams started to find new ways to agility. The teams changed how they organized themselves. This was supported by a series of team building workshops to review the current situation, define basic strengths to build on and agree on new rules and values. At the same time, we fostered our culture of open communication and mutual help by applying different methods of peer feedback.

The management team focused on team building too. They ran a meta-retrospective on the whole business unit, defined strengths to build on and dived deeper into problems and solutions. A key improvement was started by co-creatively reviewing the conflicting roles and distilled a first concept for lean leadership. How could we effectively simplify what was seen as kind of a jungle of overlapping tasks, unclear boundaries and confusing decision-making policies? What kind of guidance was actually needed? What was the value-add of each leadership role? How many were needed? And how should responsibilites be distributed between those roles?

In order to foster more bottom-up processes,we also changed the way we used to manage our change initiatives. As part of this we started to involve more functional experts in those initiatives. Rather than just executing strategies defined by senior managers we built on so-called change teams. They consisted of delegates from those teams who were affected the most by the change, and were given autonomy within a clear set of boundaries to explore and implement their own solutions. This resonated with other strategies for enhancing ownership such as various communities of practices, explicit agreements on slack time or joint hackathons.

Instead of trying to drive people by external motivators such as money and bonuses defined by senior management, we focused on enhancing intrinsic motivation of people to come up with more ideas and solutions, One change was that some dynamic salary parts where moved to fixed value in order to remove this external motivation aspect. As part of this targets were not defined by the upper management any more, instead teams started to setup their own targets.

At the same time we continued to split our software into microservices. After consulting with external software architects, we started a big refactoring initiative in order to achieve that. From this initiative we expected to get completely cross-functional teams and accelerate our time-to-market by independent deployments. It should become easier to drive innovation and customer-focused solutions. We wanted to create teams with full responsibility of a specific domain from a technological and business point of view.

After setting up teams for the refactoring and extracting first microservices the project was cancelled. The overall commitment in the upper management was still too weak, and the refactoring as a whole too big and risky. Since the refactoring initiative was stopped, cycles times were still long and innovation rather the exception than the rule. Our technical debt was still increasing. This was mirrored by an organizational structure that still blocked renewal. We lost a lot of time discussing permissions rather than actually improving software quality or implementing new technology. In order to effectively implement microservices, we needed to give the product teams more authority to select their own tools and take over full responsibility for the complete software lifecycle including deployment. Even after the big refactoring was cancelled, the engineers were convinved,that a migration of the shared legacy codebase into smaller services was unavoidable. The new approach was now to cut out parts of the old business logic as soon as new requirements were added to the coresponding code areas.

As part of our improvement efforts the team that used to maintain in-house deployment and test framework changed its course. They stopped to develop their own solutions and focused on keeping the existing tools stable. We agreed on a new policy to allow us deploying every new service with open source deployment and test tools instead of our old in-house tools. This way the product teams become stronger, got more confidence to effectively improve things and started to drive innovation.

Discovering new continents

Where did all these measures lead us to? What did we actually improve, what remained the same and what got even worse? Here is a summary of what kept us busy in 2015 and in the first two quarters of 2016.

On the technological side, we continued to move toward independent microservices. At the same time we improved collaboration between the infrastructure and engineering department. Instead of creating and maintaining their services more or less independently, ignoring dependencies, the infrastructure guys started to align themselves with what was actually needed by the engineers.

Step by step some product teams took over responsibility for creating the needed deployment infrastructure for new services as well as for operations. Consequently, teams were now able to add their own technology from the operating system level on. This was one of the preconditions for moving more and more towards a DevOps culture.

We also modernized our engineering tools. Engineers were now free to select their own workstation or laptop instead of a central virtual machine. On the one hand they got more freedom to create their own environment and tools, on the other hand they had to enhance their sense of ownership. Additionally, using GitHub opened new paths of developing our software.

On the organizational side, we continued to move towards lean leadership. As image 8 shows we removed the roles of the ScrumMaster. In regard with the latter we felt that its function as catalysts for becoming agile was fulfilled and the teams mature enough to organize themselves autonomously. At one go, we founded a pool of Lean-Agile coaches (LAC) to offer new ways of support both teams and managers.

Image 8: Dev Team Setup now

We also intensified our efforts to make important information as transparent as possible. Visual management systems became more of an issue. Product managers introduced the so-called “Wall” to provide information about current business options and upcoming changes. Image 9 shows one of the latest versions of this board that has been placed in the open space in the middle of our office. This way we increased transparency about our business flow from creating business options over assessing and selecting them to development, validating and finishing -- and enhanced both curiosity and conversation around the board.

Image 9: “The Wall”

Speaking about transparency, the whole company began to meet on a quarterly basis to share what the teams had done in the latest quarter and what they wanted to achieve in the next one. Building on our new approach, change initiatives were more transparent too. The change team´s proposal for implementation was openly discussed with various stakeholders, e.g. by using the format of a fishbowl discussion. This approach made it much easier to involve those who were affected, keep others informed and enhance overall commitment. It also helped to communicate the what and why of change directly and supported a common understanding of how to do it. Rather than relying on official announcements, formal kick-offs or expert master plans, change energy was channeled by eye-to-eye conversations and agreed on actions.

We provided a new series of leadership training, e.g. on how to lead self-organising teams to people from various areas. (3) For the very first time leaders from different areas like productmanager, teamleads, directors and HR focused on finding a common understanding of how to manage the company in the future. What was needed to effectively support self-organisation? What did this mean in terms of capabilities and practices? Jointly exploring the do´s and don’ts of lean and agile leadership, they also got to know each other better and built more trust and mutual understanding.

All these initiatives helped to decrease our silo-logic and fostered cross-team collaboration focusing on the whole value stream, This went hand in hand with a new culture of post mortems and retrospectives throughout the engineering department. Rather than finger-pointing and talking behind each other´s backs, positive as well as critical issues were now communicated directly. Honest feedback became a valuable good to be provided and received on both systemic and personal levels.

Where we are now

How did all these changes affect our business? What does our journey mean for the way we organize ourselves today? Using Conway´s mirror once again, images 10 and 11 show how our structures look like these days.

Image 10: How eSailors is organized in 2016

Image 11: eSailors´ org structure in 2016

Overcoming our assembly line process, the cross-functional product teams are now in the center of our attention. They are engines that keep our productivity high, make all important decisions and drive any change needed. Instead of focusing on software development according to given specs and handing it over to more or less disconnected departments, the product teams decide both on the what to implement and how to do it. The other departments deliver supporting services such as data, infrastructure or consultancy. Each team can deploy and monitor their software independently. Often, they make good use of the tools offered by the operations department though. Similarly, the teams can do user researches on their own but there is a user lab in place to support.

Image 12 points out how the product teams are covering the full development cycle today. This circle starts with identifying customer needs and pains. Based on what we learn throughout this discovery phase, we derive business opportunities and assess them using a special canvas. The cycle continues with design prototypes and user tests to learn everything we need to implement the right features. After a certain number of loops, these features are released into production where they get monitored and tweaked by A/B-tests.

Image 12: eSailors´ cycle of development

Our technology changed massively too. Coming from a very small set of technologies dominated by a big ball of mud, we managed to decrease the latter while increasing the variety of the former. We stopped using only one database solution for everything (Oracle) and put several different databases in production (Oracle, Mongo, Redis, Elasticsearch, Cassandra). Overcoming our exclusive focus on one programming language (Java), we use now four different languages (Java, Scala, Go, Swift). Other new tools and libraries are, for example, docker, ansible, vagrant, angularJs, consul or openstack. Every team owns specific functional domains. They are now empowered to select the tech-stack which fits best to their individual needs and build decoupled micro-services for their features. Shipping new features into production is getting faster and leaner by selecting the right tools, instead of implementing workarounds for a given and shared technology. This is possible because the engineers of a team can now focus only on their parts and take over the responsibility for the whole lifecycle of their services. Without that full stack responsibility and with the need to do intensive handovers to other departments this heterogeneity of technology would not be possible.

We have not completed our micro-service architecture yet, but we implement all new features in microservices. We continue to transform our architecture in an evolutionary way. The lead times for releasing new services also improved impressively. New services can now be built, tagged, tested and deployed into production within minutes. Sometimes the lead time from the creation of an idea to its complete implementation is only one day instead of weeks.

Although we achieved a lot, we are still confronted with tricky challenges. On the one hand there are technological challenges such as the complexity of our architecture, our ongoing journey towards microservices or the difficulty of ensuring an on-call structure. At the same time we face organisational and business challenges. On our road to a consistently lean and agile enterprise, we still struggle with the fact that teams do not have a real win and loss responsibility. Even if a product team successfully involves all important stakeholders of their value stream, the complete costs and revenues of their work are not transparent to them.

We are struggling with a full entrepreneural responsibility of Product Management. The product manager focuses on gains (earnings, registrations, ...), the costs itself (staffing, salaries, hardware, external services, training, …) are part of the engineering cost center. The costs for our internal infrastructure or marketing costs for our products are not mapped to the product teams at all.

We made a lot of progress in visualizing data and KPIs in order to track the cost of and benefits of actions. But given the missing capability to map implementation costs, infrastructure costs and efforts to this actions, it is often difficult to evaluate whether an implementation added any value at all.

We are also still struggling with cross-team interaction and a common view on our whole system. It is still not entirely clear how to improve our end-to-end work flow. Currently, we are about to explore how our “Wall” (see image 9) could be used as a means to better manage our value streams - aka a so-called Kanban flight level 3 board [4] By enlarging our business point of view we could win fresh perspectives on our step-by-step value creation process.. Likewise, we could think of including delegates from all areas in order to consistently monitor and actively control what is going on throughout the company. We strongly believe that this would enhance our capability to better identify options, improve lead times and deliver more value with less effort.

References

How Do Committees Invent?” Melvin E. Conway 

Analyst Watch: Water-Scrum-fall is the reality of agile Dave West

Leading Self-Organising Teams” Siegfried Kaltenecker (free download)

Flight levels of Kanban” Klaus Leopold 

About the Authors

Michael Gruczel is team-lead of the mobile apps team at esailors, working hands-on as servant and some times as disruptive leader. Worked as consultant before and wrote some tech articles for the german Java Magazin and the jaxenter. Reach him by linkedin, twitter @mgruc or take a look at his github profile.

 

Sigi Kaltenecker is the joint managing director of Loop Consultancy, specialising in lean and agile change. He has already been involved with multiple international companies such as Alcatel, bwin.party, eSailors, Kaba, ImmoScout24, Magna, RWE, Swiss Federal Railways, or Thales Group. Sigi authored “Leading Self-Organising Teams”, Kanban Change Leadership. Creating a culture of continuous improvement” (Wiley 2015) and a series of articles on “Peer Feedback loops” . Reach him at siegfried.kaltenecker@loop-beratung.at

Hans Gruber is Head of Engineering and Managing Director @eSailors in Hamburg. He has a strong Engineering background and build up his expertise in international companies like Tele2 and bwin.party. He works on making teams better by adapting lean and agile practices, as he strongly believes that tech and product teams can achieve excellence and deliver great customer value doing that. Reach him at linkedin.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT