InfoQ Homepage Articles Engineering Successful Cloud Migrations

Engineering Successful Cloud Migrations

Jul 22, 2020 17 min read

InfoQ Article Contest

Share your knowledge Win a ticket to a QCon event
or an InfoQ Dev SummitFind out more

Key Takeaways

Unless engineering practices introduce efficiencies, promote flexibility and drive revenues early, cloud migrations risk stalling because of cost and time overruns.
Cloud cost conversations are usually painful, especially during migrations. Generating revenues early in incremental migrations can turn them into discussions on margins from migrated workloads, which helps maintain migration motivation and momentum.
Approaching application re-architecture on the cloud as a transformation opportunity optimises applications and also the business processes they support.
Decoupling applications from the underlying infrastructure makes them portable across cloud platforms creating further opportunities for cost optimisations.
Focusing on efficiency leads to lower costs. This includes efficiency in delivering the applications and also in achieving the required performance. The latter leads to lower resource usage which lowers costs.

Benefits of the cloud are well known and evident from several successful case studies. Most cited benefits include opportunities for cost reduction and elasticity. The ease of provisioning enables firms to grow or shrink their infrastructure in line with business trends thereby incurring costs proportional to the usage. The ability to innovate using platforms on the cloud is now an equally strong incentive. This increases business agility and reduces time to market with potentials to drastically reduce opportunity cost to businesses.

As a result, cloud adoption has been accelerating. According to Flexera 2020 State of the Cloud Report, survey respondents expected to increase cloud spending by 50%, with most enterprises expecting cloud usage to increase due to COVID-19. This increase is both due to increased online activity and firms accelerating their migration programmes.

The latter is to get around reduced infrastructure headcount, reduced access to data centre facilities and delays in hardware supply chains. Additionally, public cloud is viewed as offering more reliable business continuity options. With many enterprises using two or more public and private clouds, 93% of enterprises have a multi-cloud strategy and 87% have a hybrid cloud strategy.

But the journey is not without its challenges. The same study shows most organisations struggle with controlling cloud costs and are on average over budget by 23%. It is also estimated that 30% of cloud spend is wasted. 80% of the survey participants in another study repatriated migrated workloads back to their premises. The three main reasons for repatriation from the cloud are performance, security and cost. Earlier studies quoted by TechRepublic suggest that ¾ th of cloud migrations take longer than a year to complete with the majority taking over two years.

Long delays, high costs and failure to meet objectives can result in abandoned or stalled migrations. These can lead to fragmented IT organisations and infrastructure resulting in higher operating complexities and expenditure. But the above studies, and many others, also point to what might a successful cloud migration look like.

This article discusses five engineering principles that aim to make cloud migrations economically, technically and operationally sustainable while delivering technology that achieves the desired business objectives. To put these principles into perspective, cloud migration approaches and cloud benefits will first be reviewed.

Approaching Cloud Migration

In 2010, Gartner defined 5 R’s of cloud migration. These were later extended to 6 R’s as described in this AWS blog. Many cloud practitioners and vendors consider these as alternative cloud migration strategies. Here, they are divided into reasons why an application may not be migrated and the approaches to follow for those that can be migrated.

Defining the migration scope is important for its success. This begins with an audit of the application inventory. This audit determines which applications should not be migrated to the cloud. Applications for which an organisation has the following plans are generally excluded from migration:

Retain: This category includes applications that may be considered unsuitable for cloud. Examples of such applications include those that have strict performance requirements. Applications that are tightly coupled to on-premises infrastructure and services also fall in this category.
Retire: Short-lived applications or applications which may be decommissioned in the near future may also not be migrated. This includes applications developed for time limited business opportunities. Applications supporting businesses with diminishing returns or applications showing downward usage trends are also retirement candidates.
Replace: Applications that need to be replaced in the near future may also be excluded from the migration plan. A common example is emergence of alternative SaaS products that present lower TCO (Total Cost of Ownership) opportunities.

Such an audit may substantially reduce the scope of a migration initiative leading to reduced risks and costs. For remaining applications, an organisation is faced with the following migration approaches,

Rehost: This approach is also known as lift-and-shift. Here, an organisation may choose to clone their on-premises infrastructure on the cloud and deploy their applications there. While this may seem an easy route to moving applications to the cloud, it is the least favoured approach. This is because rehosted applications may not utilise many of the cloud benefits other than infrastructure right-sizing and vertical scalability. Coarse grained horizontal scalability may only be possible if these applications can originally support it.
Replatform: Applications that can utilise services available on the cloud may be replatformed without refactoring and typically in containers. This usually involves using cloud-based databases, messaging middleware, storage solutions and platform services.
Re-architect: Re-architecting applications as cloud native applications enables them to benefit most from cloud capabilities and services. This, however, requires significantly more effort than rehosting or replatforming applications.

Migration approaches mentioned above are not mutually exclusive. A firm’s migration journey may require different applications to be migrated using different approaches. For example, an application may initially be rehosted or replatformed on the cloud and then re-architected. Similarly, different approaches may suit migration of different services and sub-systems of a distributed application.

Migration Objectives

A firm needs to clearly define the objectives it wants to achieve from a migration initiative. This is important for the engineering teams to determine migration strategies and implementation techniques aligned with business trade-offs. Following are some of the most common objectives firms wish to achieve by migrating to the cloud. In most cases, firms find it challenging to achieve these objectives. The following discussion on these challenges puts into perspective the engineering principles discussed further.

Cost

Controlling costs is one of the primary objectives of moving to the cloud. Cloud promises to reduce a firm’s technology TCO (Total Cost of Ownership) substantially. With that, the focus within the firms shifts from CapEx (Capital Expenditure) to OpEx (Operational Expenditure) and its optimisation. These conversations usually start very early in a firm’s cloud journey. At times, they occur even before the first application is launched into production on the cloud. And they are always very tough.

The key reason for cost increase on the cloud is higher usage. Higher usage may either be due to increased business volumes or inefficient resource utilisation by the applications. In case of the former, the firm must also experience proportional margin gains to pay for increased costs. For the latter, the engineering teams must always be vigilant about inefficiencies creeping into the technology they develop and maintain on the cloud.

Thus, conversations on cloud costs without considering margin trends is counterproductive. These may lead to business strategies and technology solutions that are suboptimal in the long run.

Agility

The perception of agility on the cloud is largely from the lens of provisioning. Unlike the infrastructure on premises, provisioning necessary infrastructure and platform resources is far more convenient on the cloud, albeit at a cost. But this is only a small component in the overall process of launching a change into production.

Achieving agility on the cloud involves relentless automation and optimisation of not just the delivery pipeline but also the business. When combined with lean practices and evolutionary architectural principles, businesses rapidly adjust to capitalise on opportunities and reduce risks.

Innovation

The emergence of digital technologies on cloud platforms, in combination with agile practices, promises faster innovation. Most cloud vendors are offering technologies that are building blocks for IoT sub-systems, data science applications, big data pipelines, and AI/ML services. This allows cloud consumers to build and evolve digital applications on the cloud fairly quickly.

This convenience comes with its own challenges. Vendor lock-in is a big risk as these technologies offered by different cloud vendors may not be compatible. Applications built using Domain Driven Design techniques can be migrated between different cloud offerings if they have interfaces to these technologies on the host cloud offering.

Digital implementations will need a focus on efficiency as well. This is because they will consume, process and produce large datasets. Suboptimal implementations will not only impact overall performance but also the cost.

Performance

Cloud vendors often suggest autoscaling as a solution to performance issues on their platforms. Applications with suboptimal baseline performance usually require significantly large compute instances to process anticipated load. Any fluctuation in the load would require autoscaling to fulfil performance requirements.

Running a suboptimal application would anyway be costly on the cloud and the cost increase with autoscaling may not be justified by the revenue from increased load.

Engineering Principles for Cloud Migration

The discussion above highlights how traditional migration objectives on their own may act as blind spots that risk the success of migration initiatives. In fact, most traditional objectives can be achieved by focusing on higher efficiency and flexibility while reducing the time to market and lock-in risks.

The following five principles aim to bring economy, efficiency and agility to systems that are being migrated to the cloud.

Earn As You Spend

Any cloud migration strategy should focus on generating revenue from the cloud as soon as possible. Contrary to all-or-none and big bang migration strategies, strategies based on this principle aim to deliver value incrementally in production on the cloud.

This is achieved by selecting business capabilities and sub-capabilities that can be migrated completely to the cloud while minimising dependencies on the on-premises estate. As opposed to simply migrating applications, migrating capabilities allows an accurate calculation of revenues that the migrated capabilities generate on the cloud.

As a result, businesses can track margin gains on the cloud. With margins in perspective, any cost increases not explained by corresponding revenue increases point to processing inefficiencies as applications struggle to support business volumes. Margins, revenues and costs should be tracked over a moving window of a few weeks. This is because, with optimised applications, costs may show step changes while revenues show more gradual trends. This happens when processing volumes cross specified thresholds causing applications to scale horizontally resulting in step changes in cloud resource utilisation.

This principle also reignites the controversial conversation about the lift-and-shift migration approach. Most cloud practitioners dismiss this approach because it fails to leverage the benefits that cloud technologies offer. However, if lift-and-shift can be achieved conveniently and without incurring substantial costs, it may be exercised as an initial migration step. This can result in the following interim benefits,

The migrated system will start generating revenues on the cloud fairly quickly.
The organisation can decommission the on-premises estate for savings there.
Because of provisioning flexibility, the infrastructure supporting this system on the cloud can be right-sized to further reduce costs.
Coarse grained vertical and horizontal scaling can be exercised to accommodate increased business volumes if the original system design permits it. However, margin gains here may not be fully optimised.

Once on the cloud, the system may be incrementally strangled and re-architected to make it cloud native. Again, a business capability based strangulation approach will provide more opportunities for margin optimisation than strangling individual applications or components. It is important to reiterate that lift-and-shift is recommended here as a step in the migration strategy and not a strategy itself.

Approach Cloud Migration as a Transformation Opportunity

Cloud migration is a cost and effort intensive initiative. That is why filtering out the technology estate not ready for migration is so important. But this is a coarse grained scope reduction. For the technology estate to be migrated, transformation presents scope right sizing opportunities. These also lead to business and technology efficiencies and cost optimisations.

Here, businesses can revisit their operations and redefine operating models which are currently sub-optimal. As most of the applications to be migrated will be re-architected, they can be rebuilt for new target operating models. However, this is not without its challenges.

This will invariably require changing domain and data models. This will make data migration from on-premises applications to the cloud so much more intensive. Certain data may reflect states and entities that no longer exist in the target domain and data models. These may have to be accommodated in data transformation between old and new data models. Alternatively, re-architectured components may need to be backward compatible as historical data may need to be retained because of regulatory reasons.

Such transformations may also require changes to the operating model of the clients especially if their technology interfaces to the systems being transformed. If these present significant changes for the clients, they may be reluctant to make these changes unless the resulting values outweigh the costs. Therefore, it may be necessary to partner with clients here.

At a minimum, re-architecture should not aim for feature parity. Features that are not used or very rarely used on-premises may not be implemented in re-architectured applications. If on-premises applications have feature-level telemetry, data can reveal how frequently various features of these applications are used. In the absence of such telemetry, application logs may be mined to reveal this information.

Engineer for Flexibility

Cloud offers different hosting options for services suitable for different usage profiles. Transferring applications to a different hosting option can lead to significant rework if they are tightly coupled to a specific hosting option. Similarly, cloud offers multiple options for services like storage and messaging. Each option provides cost and performance optimisation opportunities for different usage profiles. Again, tightly coupling an application to specific storage or messaging options may lead to suboptimal applications when usage profiles change. But changing these dependencies will result in complex and costly rework.

Hence, services on the cloud need to be engineered with flexibility in the forefront. This starts with using techniques like Event Storming to achieve a Domain Driven Design for the applications being re-architected. Addressing non-functional and cross functional requirements (NFRs/CFRs) early helps adapt the functional design to meet those requirements. Failing to do so leads to expensive and challenging rework subsequently.

Implementing the resulting functional design over a hexagonal architecture helps decouple functional logic from underlying platform services and infrastructure dependencies. Here, the functional logic uses port and adapter abstractions to interface with cloud dependencies. As a result, changing a dependency only requires changes to the corresponding adapter and not a wholesale refactor. This makes the system far more flexible and adaptive in dynamic and disruptive business environments.

Domain Driven Design and hexagonal architectures also unlock hybrid, multi and poly cloud opportunities. All cloud offerings have similar technologies and services that are accessed and consumed differently. Here, again, vendor-specific adapters make the same functionality portable across different cloud offerings.This provides organisations the ability to optimise for cost, performance, resilience and security.

Performance Through Efficiency

A common misconception about achieving performance on the cloud is that scaling will help resolve performance shortfalls. Ease of provisioning and autoscaling tempts relegating performance concerns till late in the development process. Thus initial architectural decisions are purely based on functional considerations and may not fulfil performance requirements. Technology then has two options,

Embark on costly rework to make architectural changes to support performance requirements.
Use scaling on the cloud to deliver performance.

Scaling leads to higher operating expenses that eat into business margins. Without changing this behaviour, every new feature will introduce further inefficiencies requiring additional cloud resources. Eventually, the business function being supported by that technology is no longer economically sustainable.

Hence, performance, like testing, also needs to be shifted left. All feature requirements should include both functional and non-functional requirements where the latter include performance requirements. Acceptance criteria for features should include both functional and non-functional acceptance criteria.

The development team should aim to achieve performance objectives through computational, communication and storage efficiencies. The team should regularly profile the code to identify performance hotspots. Hotspots that can be optimised with reasonable development effort may be addressed by the developers. The team should collect and track component-level performance metrics like CPU utilisation, response times, memory utilisation etc, ideally as part of the CI build. They should revert to fine grained profiling if these metrics show unexpected degradation so they can isolate and optimise performance hotspots.

Larger, verbose messages take longer to communicate and process. Similarly, chatty services automatically degrade end-to-end performance. Hence, communication between services internal to the system must employ data formats that are succinct. While external APIs may produce and consume messages based on standard protocols using XML and JSON formats, internal APIs can use binary formats like those defined by GPB (Google Protocol Buffers). Interactions between services should be monitored to determine API invocation patterns that can be leveraged to reduce chattiness.

Developers should resist choosing a single data model and storage strategy. Different domain concepts will translate to data models that will naturally suit different storage strategies. Hence, developers should aim for polyglot persistence behind domain-based adapters. Persistence technologies should be chosen based on the convenience and efficiency of data storage and retrieval operations. For example, where fine-grained search is required and the domain model can easily be translated to a relational model, relational databases may be used. Alternatively, for domain model objects that cannot be translated to a relational model, a NoSQL database like a document store may be used.

Addressing performance requirements early helps agreeing on architectures that fulfil these requirements. A focus on computational, communications and storage efficiencies reduces the overall resources needed to fulfil those requirements. Higher efficiency also has the potential to reduce costs.

Develop a DevOps Mindset

This principle is foundational for any transformation initiative. The key outcome of building a DevOps mindset is that the organisation eliminates waste in its delivery pipeline, builds confidence in the deliverables that are shipped, continuously improves the quality of deliverables, and then delivers them efficiently.

Hence, formulating a DevOps strategy for the target technology on the cloud starts before the migration. It starts with building DevOps processes, practices and technology around the legacy estate. Increasing test coverage and reliability, especially in end-to-end acceptance and regression testing is key to building the confidence that the features migrated to the cloud have not suffered any regressions. If rehosting or lift-and-shift is being attempted as a first step towards migration, automating infrastructure provisioning, configuration management, and build and deployment helps ensure consistency in various environments on the cloud. Additionally, it provides opportunities to optimise and economise the delivery mechanisms, allowing the teams more capacity to focus on subsequent re-architecting.

For the re-architected codebase, best practices that optimise the 4-key metrics (Accelerate metrics) provide a lean and responsive delivery pipeline. Reducing the change failure rate reduces rework thus maintaining the focus on delivering value. Reducing the lead time and increasing the deployment frequency delivers value fast. Additionally, it provides opportunities for fixing defects forward as opposed to rolling back releases in case of failures. This eliminates the risks associated with rollbacks and gives customers the confidence of a stable technological foundation supporting the business. Finally, a short MTTR (Mean Time To Recover) reduces the effort needed to restore operations after a failure keeping teams focused on building and delivering value.

Last but not the least, DevOps is as much about organisational culture as it is about practices and tools. DevOps thrives in a generative culture that emphasizes learning and continuous improvement. For most organisations, cloud migration will be a unique experience involving experimentation and learning well outside their comfort zones. Not having the space to learn from their experiences and adapt will render any practice and technology useless.

The figure below shows how these principles relate to each other. As mentioned above, DevOps is foundational for all other principles. Transformation also leads to efficiencies by decommissioning functionality no longer needed or by optimising business capabilities. Re-architecture based on business transformation leading to higher flexibility and efficiency results in opportunities to drive higher revenues at lower costs.

Summary

Cloud migrations are risky and expensive. It is not uncommon to see stalled migration initiatives and application repatriations to premises when firms are unable to materialise the promised benefits. Besides the huge costs incurred, these result in fragmented IT organisations and technology estates leading to further operational inefficiencies.

Building and maintaining confidence and momentum in such initiatives is key to their success. This confidence is achieved by demonstrating early that cloud offers an economically sustainable route to business automation. For that, both business and technology need to seek flexibility and efficiency that will require transformation of business processes, technology operation and the technology itself when migrated to the cloud.

About the Author

Omar Bashir is a Principal Consultant with ThoughtWorks. He has over 25 years of experience in technology development in defence, telecommunications, logistics and finance. He holds a PhD in performance monitoring of computer networks and is passionate about building efficient, economical, and sustainable enterprise technology.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?