HaMIS: One 24/7 Product and Four Scrum Teams, Four Years Later
This is a story about four cross-functional scrum/DevOps/feature teams delivering and managing a business-critical 24/7 system used by vessel-traffic services operators and many other users. The main messages are:
- Achieving truly self-organising teams is difficult and takes time, but is definitely worth doing.
- Feedback-driven development works. Almost everything teams accomplished was either to get feedback as soon as possible or as a result of feedback.
- You can achieve more value with same amount of people as they change habits, improve skills, and gain knowledge. This evolution, driven by continuous improvement, generates more value than adding more people would.
- Inexperienced developers can become craftsmen under proper learning conditions.
This paper grew from an initiative by two team members to share our experience with others. As we do for pretty much anything substantial in our team, we organised an open space to discuss this subject and, even more importantly, involved everyone. We asked team members from all teams to recommend subjects that the outside world might find interesting. In a second round, we asked everyone to write his or her most important message to the reader. The result is this compendium of topics that derive from our more than four years of agile and scrum practices at the Port of Rotterdam, one of the world's busiest ports.
In the midst of Internet rants and discussions about the decline and fall of agile and scaled agile frameworks, we would like to offer our experience with achieving results with scrum as it is, without changes.
The Port of Rotterdam Authority has a turnover of approximately €600 million and a staff of 1,100 employees with widely varying commercial, nautical, and infrastructure-related responsibilities. The foremost customer for teams is the Harbour Master. This division ensures the smooth, clean, and secure handling of shipping traffic (annually, approximately 33,000 ocean-going and 110,000 inland vessels).
The users are diverse. Some work for the Harbour Master division, some for other divisions, and some for many partners of the Port of Rotterdam.
At the center of these primary business processes sits an IT system called HaMIS (Harbour Management Information System). The idea for this system was born many years ago with a need to replace an existing one. The previous system had served well the Port of Rotterdam for 20 years but the old technology and architecture had become a major obstacle to any significant improvement in business processes. The Port of Rotterdam was growing and changing and the old system could no longer meet requirements.
The first goal was simple: prevent the negative value of an outdated system. The secondary goal, on other hand, was a bit less clear. The Port of Rotterdam is growing, especially with the Maasvlakte 2 expansion. The system must support ever-growing traffic in the harbour with same amount of people. This implies intensified information exchange, better coordination between involved parties, and better support of main Harbour Master processes.
How it all began
The start of a complex process of making plans, budgeting, involving vendors and software integrators, and a dreadful procurement process followed the decision to replace the old system with HaMIS. It produced a lot of documents but no real code except for a proof of concept.
Eventually, the port made a courageous decision to bring the process to a full stop. The developing programme was too complex to ever be successful. The requirements were complex and the risk high. Too much remained unknown, and it would be big news in the Netherlands if the project failed. The consequences could be even worse if a faulty system would go into production.
But the existing system was approaching end of its life, so the Port of Rotterdam did not have any choice but to restart the project with one main goal: replace the current system.
The new project’s chosen methodology was combination of RUP and scrum. Many practices were introduced, but agile mindset wasn’t really embedded. There was still too much focus on strictly following prescribed practices and especially on creating a large number of documents. Months passed and only one use case was almost delivered. A welcome quality of Rotterdammers is a no-nonsense attitude for which real results are all that matters, and the customer was still not satisfied.
Obviously, this RUP/scrum combination was not really working. Some of the managers and involved teams were already becoming enthusiastic about scrum. Another courageous decision was made: to go agile all the way. Internal project management made this first big push. Since a proper scrum introduction is disruptive, this first step was not easy and to help, the port hired a number of experienced agile people. In about three months training sessions were taught and big changes were made. Three scrum teams formed and after two sprints, the first version of a new system with new features was released in production. Things started to look much better. Software was released and used, although only a few key users were actually using these new features next to the existing old system.
During this time, the port introduced a separate QA team that also dealt with architectural questions or “architectural runway”. After a few months, all of these were dropped. These were considered a waste, conflicted with our/agile values, and eventually nobody cared about it. There was a realisation that architecture should be driven by the current business need and not by something that might possibly be needed in the future - in other words, just-in-time architecture. Also, any architectural decision is an intrinsic and inseparable part of a software development cycle. The knowledge, experience, new insights, and constant changes in business requirements all happen within the teams. Having a separate team with a runway did not make sense, and having architectural epics or features made even less.
The first features on product backlog were not replacements but completely new. The effect of this was that the technical implications were fairly simple and the team could deliver quickly. Also, users were eager to start playing with something of real value, something they didn’t have in the existing system.
During the RUP days, attention focused on delivering all kinds of documents and making 800+ architectural decisions. In those first sprints after the removal of RUP and the initiation of scrum training, the focus had moved towards embedding the agile mindset. Everyone was talking about this thing called scrum and how it works. Some were skeptical, but most were eager to try and learn. Also, every aspect of software life cycle had started to improve. Continuous integration, TDD and other XP practices became embedded. Despite the effort spent on learning, the teams were already delivering new features into production after every sprint.
What about the product itself?
The first version of product was already partially implemented before full scrum introduction. The fundamental elements were usage of Java as the programming language, a standalone Java client written in Swing in combination with JIDE, an IBM WebSphere platform on the server side, and SOAP over HTTP as protocol between client and server. There was even a SOA with an enterprise service bus. Design and layering of the backend was based on standard JEE patterns (service, business, data). A separate server-based solution with Erdas Apollo software delivered geospatial data to clients. Altogether, this was definitely not the simplest possible solution, but with a few adjustments it was workable in the first sprints.
Almost all of these original choices have changed significantly or have been removed in the past years, replaced with simpler solutions as the number of features and intrinsic complexity grew. The statement “We need to choose complex technology in order to anticipate complex requirements later on!” proved to have the opposite outcome. The chosen technologies were not needed, so other technologies replaced them along the way.
This was also the case for design. The original overall design was put aside after the introduction of agile practices. Focus has moved towards actual code and an already implemented design.
From here, architecture and design still had the scrum teams’ full attention. The main driving forces were functional epics and stories. This translates into following most prominent rules:
- If it is not needed by business in this or the next sprint, than it is not decided upon, designed, or built. It might be discussed, but only briefly.
- We must replace the old system as soon as possible.
The biggest challenge here was not the technology and knowledge of design techniques but getting clear information from domain experts and users. The complex subject matter meant that not many people could explain how things really work outside in the real world.
Thinking in simple solutions was gradually embedded in minds of everyone involved. This thinking manifested in continuously questioning and replacing already implemented choices and always choosing the simplest possible option while clarifying and estimating epics and user stories. An example was replacing the SOAP over HTTP interface between the client and server with Hessian binary protocol. This primarily meant removing a lot of code, which felt really good.
Nevertheless, before any of these technical discussions, teams demanded not only requirements behind user stories for the following sprint but also a proper explanation of context, the business process behind user stories, and any requirement that might impact the choices at hand.
The effect was most noticeable in UX design and asynchronous messaging. Instead of spending a lot of time on choosing a grand new technology to serve for the next 20 years, teams refocused on understanding which requirements would fulfill design or technology choices in the present. When requirements were currently lacking or much further in the future, teams would take these into consideration: is the current choice going to prevent us from meeting those requirements in future, and will it be costly to replace this choice? If no, it becomes a waste of time to further analyse the choice.
In other words, we spent huge amounts of time on understanding short-term and long-term business requirements but very little or no time on the design and architecture of things not used after following sprint.
All architectural decisions were made by teams. There was no real architect in traditional sense, only someone who communicated the most significant decisions to stakeholders and facilitated sessions. In the beginning, the most experienced team members made design and architectural decisions. This caused problems in team dynamics. Design discussions resulted in decisions and sketches. Since these drawings largely defined the tasks belonging to a story, the other team members felt disconnected from what was happening.
Eventually, all design and architecture discussions became a team effort. They started during the sprint planning meetings, but the real work was done just before a team started to work on a specific user story. A sprint may have one discussion for each story if needed. The rule of thumb was that a discussion ended when all team members understood the design and could participate in its implementation. The more experienced developers were still the most active during these sessions. Other team members usually asked questions, which the experienced developers answered. All teams were invited to participate in any decisions with big impact.
Every single aspect of architecture emerged gradually or changed. Everything was introduced only when needed, except for the planned, gradual replacement of many obsolete technologies. In the beginning, only one server instance provided services to clients, and the database and the domain model contained only those classes needed for the stories built at that moment. We had a walking skeleton with only one leg. It could jump and that was good enough at that moment. Once we realized it would probably fall because of additional weight, we introduced another leg, and a cluster was born.
In our experience, a complex architecture like this can definitely emerge as long as teams constantly spend a considerable amount of time on design and architecture.
Since teams, together with product owners, were able to decide how to spend their time, they would often choose to experiment and build new innovations. We often held hackathons and ShipIt Days, during which we would try to deliver in one day something that was not yet on product backlog, more or less free from any constraints.
Focus on users
We used many of the well-known practices for understanding business needs: user stories, epics, themes, and releases. Although they’d been useful, teams spent most effort on simply inviting users to visit or visiting users on the job, talking to them, and, most importantly, observing their work. HaMIS team members would take the initiative, without PO involvement, to arrange visits. As a result of this effort, epics and user stories were often rewritten or replaced. Teams would usually involve the product owner afterwards. While the product owner was generally absent during these sessions, he or she still decided whether or not items should be placed on the product backlog.
Users had become involved in the process, and were visiting teams on weekly basis. During sprint reviews, a meeting room would be completely filled with users and business people. Unfortunately, it gradually became more difficult to keep them coming after every sprint. Continuously delivering new features for years became business as usual. In the beginning, everyone was excited by such fast delivery.
A big lesson learned was that comments from product owners or someone similar could never replace talking to users. In complex challenges, collaboration with users became even more important. Occasionally, teams misunderstood the need, which resulted in rewritten functionality in the following sprints. An interesting observation is that talking to users seemed occasionally difficult. Users really liked to talk when asked specific questions. They would expound on all kinds of detail at that moment, while we came for specific answers to specific questions. This was due to different perspectives between their world and our software world. Nevertheless, team members found that talking to real users was definitely the most effective and accurate way to discover requirements. Despite difficulties, it was always worth the effort and beneficial for all involved.
On other hand, requirements were often unclear or improperly argued, frequently due to a difference between how users were working with the old system and how business people wished to improve the existing process. The usual solution to this problem was to simply choose the most probable approach and show it to everyone. Any changes as a result of this shortcut were still more cost-effective than further discussion or pushing the problem back to business. Probably the most important reason for why this works is the trust higher management has in the product owner. Imagine the cost of four teams during one or more sprints, and then duplicate costs in following sprints as they must partially or completely replace functions.
Our product owner was actually two guys. This would occasionally cause problems when one contradicted the other. Luckily, they would solve this fast. Both of them came directly from business, and have always been dedicated full-time to HaMIS.
The product owners were supported by a number of domain experts who would gather and provide information to product owners and teams. A pitfall of this was premature functional analysis. In other words, it was okay to gather factual information, but translating this into anything - e.g. possible features - prevented teams from really understanding the problem and asking “why” questions. Not only does real understanding get lost in translation, but one of the first feedback loops becomes broken. Therefore, analysis and exploration of user stories shifted from domain experts and analysts to the teams, with support from the domain experts. The presumption that nerds were incapable of asking the right questions and should not talk to business proved to be completely wrong.
More work, so more teams?
Our teams constantly improved, with great results. The rest of the organisation noticed. This had two effects: other IT departments and teams started introducing scrum and business people with budgets made more and more requests, even from outside of Port of Rotterdam. About 2 years ago, the Port of Amsterdam wanted to replace their system with HaMIS. This did not mean they would receive a DVD with our software; our teams and the PO suddenly had a whole new group of stakeholders and users for the same product. The principles of incremental delivery and close contact with users and customers are still applied. The difference was that teams needed to spend some time in Amsterdam, too.
The product backlog was huge. It still is and seems to be growing. All these stakeholders wanted to have their value preferably yesterday. This automatically triggered the question about scaling towards more teams. More resources means more work can be delivered, right? Every time the question would arise, the teams’ answer was “No!”
We realised that the request for more teams was by itself an incorrect request. The correlation between more teams and more production was, at best, weak. At worst, it could have exactly the opposite effect. The requests were followed by a number of questions from the teams:
- What is the exact need? Is it clear enough?
- Should this be part of HaMIS as a product?
The most important conclusion was that teams would rather keep improving effectiveness through better ways of working together, and especially with users and stakeholders, instead of introducing new teams or team members. Eventually, both teams and management agreed on this way of thinking. We all had and have a strong feeling that a lot can be improved in the process, even after four years. We came back to the observation that one of the most important areas of improvement was product backlog. This was also the reason for the question above.
In the first three years, only three teams built HaMIS. Eventually, the teams themselves decided they could hire additional experienced craftsmen and craftswomen, create another team, and still be effective. A good thing about new team members is the experience they bring from other projects. After four years, team members were getting along pretty well. Lack of conflict meant a lack of challenges and possibly overly similar ways of thinking.
HaMIS has two project managers. One of them is an official HaMIS project manager while the other is mainly concerned with external communication and coordination with partially dependent projects in other companies or departments. HaMIS PM is not an easy role. The outside world expects a project manager who takes the blame if things do not go according to expectations, but we are fully self-organising and take most of the organisational decisions ourselves.
The concept of self-organising teams is not difficult to grasp, but to truly understand its meaning and to behave accordingly seems to be very difficult for an average project manager. Because of his responsibility to the management, a PM finds it is very difficult to not steer the self-organising teams. It is in his nature. This was also a problem at HaMIS, until the moment the project manager truly understood the concept and stopped steering and started communicating and working within the team.
For a self-organising team to really succeed, it needs boundaries. Teams will take on much more work than they are responsible for. Without boundaries, we probably would have tried to “agilise” the entire Port of Rotterdam. Therefore, management had to decide to what extent we could decide on matters. Could we decide to change work processes? Yes. Could we decide to split a team in two? Of course! Could we decide to hire a new team member? Yes, but what about the financial consequences? Could we decide to replace the data warehouse? No, that is the responsibility of another team. Management decided that all decisions that would affect the teams and the teams only had to be made by the teams and by no one else. Any decision that would affect people outside the teams had to be made together with those people.
Initially, we had a natural distrust of the HaMIS project manager. We were probably aware of the average project manager’s need to steer. This attitude was obvious when PM would suggest a course of action. We would think, without really saying, “Are you demanding this as PM or merely trying to help?” At the moment the project manager proved that he truly understood the concept of self-organising teams and acted accordingly, trust was established. This trust is essential to effectively work together, do the right things, and keep the outside and the inside world in sync.
Essentially, the project manager is fulfilling a more facilitative role. This includes protecting the teams, reporting to higher management, and executing teams’ requests.
No big bangs
The basic challenge we had was replacement of the existing system. This had to be done soon to minimize operational costs and ratchet down risks. At first, we would spend a lot of time in discussions about which approach had the least technical implications, every step of the migration path, and how the order of delivered functions would affect stakeholders and their priorities. The main risk was that we could not afford to have the current or the new systems down for any significant length of time. This is a reason to avoid any big bang but the possible risks of downtime and other issues were why we decided to postpone releases and ironically create big-bang releases..
The first lesson was the tipping point between proper analysis and analysis paralysis. Probably the most analysis paralysis came out of considering the migration path. We had a thick document written before scrum introduction, which we never really used. Thus, we spent many really creative sessions to discuss different possibilities. One week, we would have one conclusion and another week we would decide something different. During the third week, we might even revert to the first option. The main problems were complexity and too many assumptions.
At some point, we simply stopped analysing the whole migration path and took a leap of faith in our own capability to deal with whatever migration-path problem would arise. From that moment on, we started to focus more on business value and in which order features should be delivered. We kept solving many migration issues separately, but never again looked at migration as a whole process and we often did not know what steps would follow. This way of thinking greatly simplified the challenge.
We also gradually realised the power of short feedback cycles. Every time we had large piece of functionality that went unused for several months, we paid dearly. As soon as users worked with features, feedback poured in. Things were not working as we’d assumed (different functionality was needed), features were missing, production messages from other systems deviated from interface contracts, and so on.
The days right after such a release were stressful. At the same time, the challenges were all technological. If something was really urgent, teams were able to find the problem, make a change, test properly, and release a completely new version into production within one or two hours.
We learned that we really needed to remove these big bangs. We also learned that there was a huge difference between potentially shippable, shipped and reviewed by key users, and actually used in production. Although we released features in production after almost every sprint, the software was not used until the minimal completion of a process. A process in this case was some chain of interdependent tasks a user performed and had to complete in minimal form in order to be useful.
We realised that with a lot of creative thinking, we could gradually release anything. The users would get to use software piece by piece instead of only after a few big bangs. The advantage of our situation is that users belong to the same organisation or are at least tightly involved in the process. Alignment about when and how they would get each feature was fairly easy.
After four years, it still proves difficult to prevent big bangs. This might be due to an ever-larger number of users, but most challenges seem to be in defining minimal viable product, feedback sessions with users, and setting release dates. A lot can be improved in all of them. In other words, the challenges are much more organisational than technical.
An interesting principle we all embraced was that if something was really difficult, like changing interfaces between the old and the new systems, we should do it more often instead of instinctively postponing the challenge. Another example is a patch release. Releasing patches before end of the sprint is considered a preventable problem, which is always discussed in a retrospective. Nevertheless, we became really good at releasing a patch very quickly. In the beginning, patches were cumbersome, but luckily we never postponed them because they were difficult. This really reduced the negative value for users.
Eventually, we based the migration path on the principle of strangling. Although there were many challenges along the path, we managed to turn off the old system completely without problems we couldn’t solve quickly.
Four years ago, when the Port of Rotterdam introduced scrum, three teams were building the new system while a separate operations team of six people dealt with the existing system 24/7. When the first parts of the new system were released, this operations team took on responsibility for operations of the new one. In the beginning, this was not much of a problem because the required availability of delivered features was not high. HaMIS development teams could usually take care of any problem the next day.
In time, HaMIS grew and larger changes were put in place. Although the operations team had limited or no Java experience, the relevant teams together with management decided to spread operations people over HaMIS development teams. They were still scheduled for 24/7 stand-by, just in case something happened in production. Gradually, the operations people would contribute more and more to HaMIS development. All of them were given an opportunity to learn from their teammates about Java and many practices and technologies.
On other hand, as mentioned before, the new system definitely had issues entering production. Someone had to take care of them. The solution was to appoint one team as the operations team during each sprint. In other words, a normal development team would temporarily become an operations team. In the beginning, this team would spend the whole sprint on incidents, monitoring, issues, and so on. Since any developer would rather build something instead of solve self-created incidents, all teams would spend time analysing and preventing these issues; this is popularly called “eating your own dog food”. Eventually, the number of issues dropped, even with growing system complexity. An operations team would start delivering more and more backlog items.
By the way, the teams usually treated any work on infrastructure configuration as one or multiple tasks belonging to a user story. Officially, the infrastructure was managed externally. In reality, the development teams were monitoring the whole infrastructure, introducing and scripting changes. The external company was more or less only executing tasks we defined. We have spent a lot of effort in involving them in the process, thereby increasingly resembling one single team. Communication between DevOps and the company was direct, mostly through Skype, but if needed they would come to our office.
Probably the most crucial cog in this was to have an infrastructure expert in a team. He made sure that the delivery process was going smoothly and did almost all of the communication with the external service provider. Ideally, this work was not needed, but thanks to him we managed to deliver after almost every sprint.
We could write a lot about the ways we deal with product-vision-board workshops, TDD, code katas, ATDD in combination with a Swing interface, high code quality, simplicity in architecture, in-memory database, event-driven design (also a replacement for batching), changes in interfaces without versioning, continuous integration, and replacement of geospatial software with a simple Web cache.
In the end, we are most proud of the ways we collaborate and achieved goals. The most important goal is satisfied users and customers. The reason for success is not the level of experience. We do have a number of very experienced craftsmen, but we also have many developers who only learned to write proper software in the past four years. The HaMIS team is a mix of Port of Rotterdam employees and contractors, with contractors making up about three-quarters of the workers. The contractors are a combination of freelancers and employees from several service integrators. It was interesting to observe that contractors, after four years and in some cases for much longer, felt more connected to HaMIS than their official employer.
This all did not just happen because of the introduction of scrum. It took us four years of continuous discovery and work on creating a great team. The scrum introduction was merely a starting point in a long process. The first struggles were about a correct interpretation of agile values. One of the latest challenges is how to keep everyone just as energetic as in the beginning.
The reason for success is twofold.
One is the constant trust, support, and freedom that programme and project management, product owner, and customers gave us. Of course, teams did gain this trust by constantly delivering value, but often, the directly involved management dealt with budgets and made sure we received everything we needed. Especially in the beginning, many stakeholders had difficulty understanding these things called agile and scrum. The most difficult sticking points were discrepancies between the scrum way of doing things and how the rest of organisation handled financials, plans, changes, and so on. This was a long process of constant improvements and discussions mainly between project and programme management and other departments. It was nice to see that the financial department was prepared to participate in this process. Together, they discovered workable ways of financial reporting, helped by basic scrum practices such as product and release burndown charts. It also helped that we were fully transparent.
Another reason for our success was the usually unspoken principles of how we collaborate. These are based on agile principles and have become part of who we are as software developers.
One goal, one vision
Although teams spent a lot of effort to understand and challenge the stated goal and vision, eventually everyone embraced these. We liked the goal very much. We knew the direction and we believed in it. Any doubts about the clarity of the goal were discussed regularly. There was time for these discussions. We were not afraid to challenge the top management of Port of Rotterdam. Luckily, the company culture already embraced this kind of collaboration. It was okay to ask hard “why” questions and keep asking until receiving an unambiguous answer.
Team spirit over superman
As mentioned earlier, we were very much aware that some team members are more experienced than others. Quite often, this was openly mentioned and discussed. A number of times, teams decided to reshuffle team members and spread experience evenly over teams. In the last year, we have even discussed whether we should shrink the size and number of teams, and therefore fire a number of team members.
In this whole process, we hold certain values. One is about continuous learning. Being a HaMIS team member means not that you must be very experienced but that you are prepared to learn constantly. Even more important is the willingness of experienced people to guide and help others and constantly focus on team goals.
Although we all know which of us are very experienced, we all realize that superman behaviour is not good for the end result.
Experiment, inspect, adapt, and practice
HaMIS team members are not big book readers. Except at the beginning after trainings, we introduced and applied practices on our own the best we could. Many of the practices we discovered along the way. Our focus has been much more on experimenting and practicing instead of “properly” applying practices. A big advantage of this is that we never lost sight of the intended goal. Understanding the “why” behind each practice was not optional, and the same applied to getting real results.
At the start we often discussed scrum and the whether each practice meant this or that. Eventually, these discussions stopped. Most things we got to know or discovered became tacit knowledge, shared with others. Sharing knowledge itself was constant. It happened all the time: pair-programming, code katas, discussions during lunch, design workshops, sprint planning meetings, product-backlog refinement, and so on.
One thing many noticed was amount of noise on the floor. For us, this was a sign of productive collaboration. We did not mind.
Learning is primary, responsibility and trust come second, but having fun is definitely third on the list. Having fun required a lot of facilitation. Many team members would take the initiative to organise events, and we did all kinds of things: ski-trips; after-sprint dinners; LAN parties; movies; “what am I proud of” talks; and so on.
Port of Rotterdam management also organised barbecues to celebrate achieved milestones. That’s one more reason to have very short release cycles. :-)
We are proud of this team and its achievements. We are grateful to the Port of Rotterdam for trusting us to solve this massive challenge. The respectful and close collaboration is refreshing. It is a joy to get out of bed knowing that you are going work with great people in a great environment and achieve something meaningful.
The HaMIS team.
About the Author
Viktor Grgic is an Agile coach, architect and developer with 17 years experience in delivering software products. He has trained architects and teams, introduced Scrum in many organisations. Viktor is a blogger and speaker who regularly talks mainly about Agile Architecture in different settings. He also provides open and in-house trainings.
Dmitriy Khmaladze and Leonid Ganeline Oct 08, 2015
Jan Stenberg Oct 06, 2015