Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Reducing Cloud Infrastructure Complexity

Reducing Cloud Infrastructure Complexity

Lire ce contenu en français

Key Takeaways

  • The multi-cloud journey is often a long, complex one.  Adaptable tooling is key to deal with current heterogeneous systems, and accommodate future change.
  • Team buy in is key.  The transformation will touch many organizations and require leadership and a well communicated vision to succeed.
  • Measure the current state of IT utilization.  A successful transformation not only needs a clear vision of the future, but also a clear understanding of the present.
  • Cloud selection must be driven by current and anticipated business needs.  This can include differentiators such as security, performance, reliability, cost for anticipated usage, etc.
  • Avoid large scale sudden transformation; favor incrementalism initially.  The journey to multi-cloud is one of organizational learning as well as execution.  Incremental transformation allows internal systems to adjust to new realities and absorb the shock of change.  Later, when experience and confidence is gained with new technology and operational models, the change can accelerate.

Cloud computing adoption has taken the world by storm, and is accelerating unabated.  According to Flexera’s annual State of the Cloud Report for 2020, 93% of respondents used multi or hybrid cloud strategies. The consumption of computing resources as a service provides great flexibility to businesses, lets them control costs and focus on core business needs rather than datacenter operations.  As the computing landscape has fleshed out over the years along with the ubiquity of high bandwidth connections, the variety of services and pricing models has grown.  With providers competing for opex spending by offering not just base compute capabilities, but also platform as a service alternatives and highly specialized services like data storage and machine learning, the complexity for consumers taking an optimal cost or best of breed approach has been increasing.  However, it can be argued that this apparent complexity is the result of having a diversity of options, whereas individual applications may actually see a reduction in overall complexity.  This article examines different aspects of cloud infrastructure complexity, and approaches to mitigate it.

Aspects of Multi-Cloud Complexity

Effective use of cloud resources goes far beyond simply moving existing on-premises applications onto a favorite cloud platform.  Often, a rethinking of architectures based on the availability of cloud services can greatly simplify designs and operations.  After all, offloading operations is a primary benefit of cloud computing.  For example, an application that previously required a highly available database cluster can be transformed into a Database as a Service  (DBaaS) client, offloading the burden of operating a database.  So a judicious use of cloud services and technology can potentially result in a reduction of overall (architectural and operational) complexity, at least for a single platform.

At the other end of the spectrum from simple rehosting is cloud native transformation.  A cloud native approach, usually associated with containerized applications, takes account of the flexibility of cloud at a fundamental level.  Applications are broken into services, each of which has its own lifecycle, API and related semantics, fault tolerance, and scalability.  The transformation to cloud native is typically a long one, and not all or nothing.  This means potential complexity in managing workloads across legacy on premises applications, cloud hosted services, and cloud native workloads (both on and off-premises) all at the same time.

Other factors complicating multi / hybrid cloud applications include:

  • Security.  Each cloud has its own security profile which may have to be blended with other clouds especially for applications that span clouds.  Putting application on multiple clouds requires understanding these security regimes, and exposes cloud based workloads to threats not present in on premises scenarios.
  • APIs.  Each cloud has its own APIs, with its own nouns and verbs.  Even cloud resources that are superficially identical on each cloud (like Image or Instance) can have subtly different semantics.
  • Logging.  Operational logging to provide visibility and diagnostics are critical in a distributed environment.  The complexity of collecting and consolidating log information across multiple platforms can be substantial.  Logging is also critical for security audits.

Keeping Things Consistent

Organizations the world over are indeed struggling with unnecessarily complicated multi-cloud environments. Validating the global struggle, Enterprise Strategy Group recently conducted a global survey of 1,257 IT decision makers at enterprise and midmarket organizations using both public cloud infrastructure and modern on-premises private cloud environments. The results hit home, and really do cement the notion that this cloud fragmentation is getting worse as time goes on - and that there are many companies out there who are seeking a ‘savior’ toolset to get a zoomed-out view of policies, compliance, security and cost optimization. 

An unsurprising outcome of the survey is that there is a clear value in cloud management - yet even knowing said value, organizations are struggling with implementation. A mere 5% used consolidated cloud management tools extensively on premise, or across public and/or private cloud.  This despite a burgeoning marketplace of all in one solutions like VMWare VRealize Suite, Flexera CMP, Cloudbolt, and others.

The Journey And The Way Out

For many companies, the move to exploit cloud resources is not part of a strategy, but rather the result of individual teams addressing needs as they arise in an ad hoc fashion.  This kind of disorganized migration to the cloud results in technology silos and explodes the overall complexity by introducing a grab bag of scripts, tools, technologies, and standards.  To escape this complexity trap, companies ideally should adopt a cloud strategy backed by a flexible automation platform that doesn’t throw away sunk investments, but can provide a path to a more manageable and cost effective future.

Step 1: Communicate

For any organization of significant size, the design and execution of a cloud strategy requires the coordination and cooperation of many business functions and groups.  These can include areas such as finance, product management, sales, engineering, operations, and possibly others.  Without buy in and cooperation, whatever strategy that is devised is unlikely to succeed.  The key is communication to build trust, understand objectives, and build a vision of a better future.  This communication needs to be ongoing and not just a single step of the process.

Step 2 : Audit

The typical company has no comprehensive understanding of its current cloud infrastructure usage.  Cloud usage has often evolved in organizational silos in a reactive way.  It is critical to develop a comprehensive strategy that ideally breaks those silos where possible, improves security, and controls costs.   Without understanding these aspects of current usage, any strategy developed will be unrealistic and unlikely to succeed.

Step 3: Planning

With the data from the audit in hand, planning can begin.  Like war, few plans survive contact with the enemy.  So plan to iterate.

Cloud Selection

The foundation of the cloud strategy are the underlying business goals, which leads to an architecture that selects among cloud platforms, whether public, private, or hybrid.  There is no standard rule book for this part of the strategy, but careful consideration needs to be given to requirements for availability, scalability, security, regional coverage, performance, and cost.  Bear in mind that these are initial selections, and any strategy should have a degree of cloud agnosticism built into their architecture.

Automation Tools

The key to a successful plan is to defer disruption as long as possible.  So the early key is to adopt the technology that will be foundational for the strategy to succeed.  Note that tool selection may be impacted by the cloud platform selections.  In a multi-cloud strategy, this foundation layer is automation technology that will reduce and manage the complexity inherent in adopting multiple platforms.  The groups that will be most impacted by this choice need to be deeply involved and supportive of the tool selection.  

Cloud agnostic automation can provide a way out of the jungle of technologies, providing centralized control and repeatable, versionable processes ( i.e. infrastructure as code) .  Ideally the automation platform will be able to integrate directly with CI/CD tooling and operational/business support systems ( OSS/BSS) via a well supported API.  Such platforms can deploy, heal, and scale deployments across several clouds, and on premises systems/clouds like Openstack or VSphere.  The infrastructure as code approach allows the automation details to be developed according to Agile, incremental evolutionary processes.  In this way complex automations can converge on optimal performance, along with permitting related security reviews to prevent.  A competent platform will also support log aggregation of operational events, as well as provide the capability to orchestrate and plumb application log aggregation across multiple clouds and on premises.  

A company that wants to optimize based on service quality and/or cost, will face complexity challenges as well, especially if maintaining a degree of cloud independence is a priority.  Here too a multi-cloud automation platform can simplify targeting different cloud platforms to exploit their individual performance, security, services, or cost characteristics.  Advanced automation can place workloads where they make the most sense at the moment, and adjust to changing circumstances.

Automation Platform Definition

Once the automation tools are selected, processes and best practices can be put into place.  This process includes designing the controls that will govern cloud access and usage, which are so critical to cost containment on the public cloud.  During this phase, staff can be trained, and gain experience building the automation that will support their operations.  Beyond engineering operations, during this phase is where other company functions can become familiar with the information exposed by the automation tooling as well as the cloud platforms, for incorporation into their job functions.  During this phase the end to end system can be prototyped, to integrate functions from sales and service ordering through to service delivery, remediation, and reporting.


The last step in the process is the actual implementation.  As mentioned earlier, it is highly unlikely (and highly undesirable) to have a “big bang” roll out.  A better approach is to select one of those silos to adopt the new system first, develop lessons learned, refine, and repeat the process for the next silo.


The digital transformation to the multi/hybrid cloud future is one fraught with both promise and peril.  Advantages delivered by the cloud model are besieged by complexity that threatens to nullify it (or worse).  The adoption of a flexible, cloud neutral automation platform can manage that complexity by providing an infrastructure as code approach, and allow for the re-use of existing automation assets.  An incremental approach to automation development, along with processes that review automation templates for correctness and security compliance, can maximize the benefit of adopting multiple clouds while minimizing the risks.

About the Author

Nati Shalom is the founder and CTO of Cloudify, an open-source multi-cloud orchestration platform featuring a unique “Environment-as-a-Service” technology that has the power to connect, automate, and manage new and existing infrastructure and networking environments of entire application pipelines. Nati is a thought leader in cloud, big data, open source, and more. Nati has received multiple recognitions from publications such as The CIO Magazine and YCombinator and is one of the leaders of OpenStack and DevOps Israel groups.


Rate this Article