Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Sleeping Well at Night During a Live Cloud Migration in a VMware Environment

Sleeping Well at Night During a Live Cloud Migration in a VMware Environment

Key Takeaways

  • Challenges with cloud migration often come from uncertainty in planning, concerns about disrupting users, and mismatches between environments.
  • The most straightforward VM-based migration techniques involve like-for-like hypervisors.
  • Based on available bandwidth and data volumes, a VM's live migration may be entirely over the network, or it may involve sending some data to the new destination on a physical device.
  • Interdependent virtual machines that make up a "system" should be migrated concurrently.
  • Automating migration tasks and orchestrating the migration of groups of VMs are critical to your success.

Organizations continue to pursue the shift from maintaining their IT infrastructure in traditional on-premises data centers to the cloud. Yet they remain challenged by the difficulty of moving business-critical applications from their current infrastructure to their cloud service providers’ facilities. One reason? Concerns about disrupting operations during migration.

The risks of operational disruption can be mitigated by preserving the underlying virtualization infrastructure, moving virtual machines and their virtual disks in their entirety from the on-premises VMware environment to a VMware environment maintained by a cloud service provider. Working with one of the thousands of VMware Cloud Provider Partners (VCPPs), administrators can be confident that their new cloud environment provides compatibility with their VMs and their data stores, and they can use tools like vRealize to define robust blueprints for network, CPU and memory requirements as well as load balancing, monitoring and management.

For virtualized workloads, operational disruption is minimized when the virtual machine continues to run on-premises while the VM and its data are copied to the cloud destination. Keeping a VM running while the VM and its data are copied to the cloud is referred to as “live migration.” Once the VM and its virtual disks are fully replicated at the destination, the virtual machine can simply resume operation at the destination.

This overview describes the challenges of live migration to the cloud and presents key concepts and requirements that enterprises and their service providers need to understand and adopt if they want to sleep well at night when migrating on-premises VMs and data to the cloud.

Cloud Migration Challenges

Systems integrators and cloud service providers typically offer professional services for cloud migration. In doing so, they commonly face challenges including:

Uncertainty in Planning. The more sophisticated and bigger the organization, the greater the complexity of the migration. Some applications are closely integrated and should migrate together. Network addresses have to be preserved or at least managed consistently. Permissions in LDAP or Active Directory may have to be replicated exactly if applications are expected to be immediately available to their users. And security requirements must be defined and enforced at the point of origin, the destination, and the data transfer process. The network used for migration should be evaluated in terms of available bandwidth relative to the amount of data to be replicated. The order in which VMs will migrate should be defined, and the time required for migration should be understood before the migration begins. Finally, the replication method should be defined — whether fully over a network or in conjunction with the physical shipment of a data transport device.

Operational Disruption. The first on-premises applications to migrate to the cloud were typically test and development systems, which could be stopped for migration. Disruption to these systems entailed some inconvenience, but keeping the systems running continuously was not required. As organizations began to consider migrating business-critical operations to the cloud, “live migration” became a key requirement. In some cases, a small maintenance window may be available for migration; in other cases, applications cannot be interrupted for more than the few minutes required for a virtual machine to restart. In addition to the brief interruption when starting the VM in the cloud, there’s a chance that live migration could detrimentally impact the performance of running applications as their data is read directly from production storage systems. When considering a migration project, the organization must develop an assessment of the amount of operational disruption that could result and prepare appropriately.

Software Overhead. Many cloud migration solutions available today manage data transfer through technology based on legacy disaster recovery (DR) solutions. While these technologies may have been updated in some respects for cloud migration, they may retain some limitations. They may be complex to configure or difficult to administer. They may require continuous, uninterrupted connectivity between the source data center and the cloud. They may introduce agents or drivers that are incompatible (or not fully supported for use) with technologies already in use in the source data center or the cloud destination. Finally, DR-based migration software generally cannot support live migration when data is moved on a data transport device.

Storage Incompatibility. Data migration solutions that are based on specific storage infrastructures may not be useful across all workloads. Many enterprises have accumulated a diverse set of storage systems and datastore types, including traditional shared storage arrays, file servers, and hyper-converged infrastructure. Not all storage-based data migration solutions can support all datastore types. For example, migration based on file-level synchronization may not easily support block or object datastores. Similarly, distributed data management approaches based on object datastores are fine — for on-premises object datastores. In contrast, a storage-agnostic migration solution will be more compatible with the storage at the on-premises source and the cloud destination.

To address these challenges in a VMware environment, it is best to seek out a purpose-built solution for cloud migration and one that is compatible with VMware vSphere, both on-premises and in the cloud. While many organizations are migrating to the VMware Cloud on AWS, the same principles apply when migrating to any one of the thousands of VMware Cloud Provider Partners (VCPPs).

Live Migration – An Overview

Live migration has a number of facets. First and most importantly, the VMs remain in normal operation while data replication is in progress. What’s more, standard vSphere functions like vMotion should be possible while data is migrated. Data replication should not disrupt or significantly degrade performance for the VMs and applications at the source. There should be minimal VM downtime, with configurable downtime settings.

Key Requirements for a Cloud Live Migration Solution

When evaluating cloud migration solutions for uninterrupted runtime operations, we consider the following capabilities essential:

  • Hypervisor integration
  • Lightweight deployment and simple operation
  • Replication of both “cold data” and “hot data”
  • A capability for “pre-flight” prediction of migration duration and migration parameters
  • The ability to move data on a physical data transport device that can be shipped to the destination
  • Migration of interdependent VMs in groups
  • Administration, orchestration and automation of migration tasks
  • Control over the balance between the data replication rate and VM performance
  • Fault tolerant data transfer that can resume replication if interrupted

Let’s go into each requirement in detail:

For migration of virtual machines, integration with the hypervisor ensures the broadest compatibility. Hypervisor-level integration should not require any kind of agent software in the VM and should be storage agnostic. A properly integrated solution for data capture and VM migration should be transparent to both hardware and software components of the source environment and shouldn’t impose any specific infrastructure requirements at the destination.

Hand-in-hand with hypervisor integration is the need for lightweight deployment and simple operation. Software agents may be relatively lightweight components on an individual basis, but deploying the right agent into each VM is not a trivial administrative task. Similarly, in contrast to the components of a DR solution that is expected to run continuously over a period of years after deployment, the software used for a one-time migration should have a lighter footprint.

Hot and cold data replication is the key to keeping VMs and their applications running as they’re migrating to the cloud. Cold data is the data that doesn’t change during the course of the migration process, so it is copied over just once during the migration. In contrast, “hot” data is data that is written (and rewritten) at the source after the migration task has begun, and it’s the job of the software to catch these data operations and ensure that the new “hot data” blocks are copied to the destination appropriately. Copying “cold data” is referred to as “background replication,” and catching freshly written “hot data” for transfer to the destination is called “foreground replication.”

Administrators want to be able to estimate and predict the time and network bandwidth required to move the VMs and their data to the cloud. Running the migration solution in “observation mode” prior to the migration task helps determine the amount of data to be transferred, the actual network bandwidth available for the transfer, and the amount of “hot data” that will change during the transfer. This predictive capability should be available for not only individual VMs, but also “migration groups” — sets of interdependent applications and VMs that should migrate at the same time. Prediction requires the ability to evaluate the replication network bandwidth, latency and rate of packet loss over a period of time and then use this information to set replication parameters to appropriately utilize the network for data transfer. The administrator may determine that it is more efficient to transfer “cold” data via physical shipment of a data transport device (see below).

Related to hot and cold data replication is the ability to move data on a data transport device. Sometimes the amount of data to be migrated is very large, and moving all the data over the network would be impractical. In these situations, transferring the data on a physical device can save a great deal of time. Even when data is transported to the cloud on a physical device, the application and VM can continue to run at the source. Essentially, the background replication of the cold data happens from the device, while the foreground replication of the hot data takes place over the network.

Group migration is another important requirement. Some VMs are relatively independent and can migrate individually. Others may have dependencies, such as the VMs for a front-end web server, an application server and a back-end database. To minimize operational disruption, these interdependent VMs should always be running in the same location, so their data should be copied to the destination concurrently, and the VMs should be brought online in the cloud simultaneously. Fortunately, tools like VMware’s vRealize Infrastructure Navigator can present visual maps of these dependencies so that the administrator can know that the VMs should migrate as a group.

Efficient administration, orchestration and automation are the key to enabling large-scale migrations without burdening administrative resources. In any migration project, administrative oversight is important to monitor progress and make any necessary adjustments in real time. At the same time, automating repetitive operations frees the admin when things are going as planned. Automating migration tasks includes the ability to select a number of VMs for independent migration as a part of the migration project. The VMs (and their data) can move from the source to the destination, one after another, each one powering on automatically. Orchestration includes the ability to manage a coordinated migration of groups of VMs.

One of the most important things to monitor in a live migration is the balance between the data replication rate and the new data I/O in the on-premises environment. During the migration estimation process, the administrator should have been able to predict that the overall data replication would be substantially and consistently greater than the on-premises write operations generating new hot data requiring foreground replication. However, it’s helpful to have a “dashboard” view to confirm that’s the case during the actual migration task. In the event that either on-premises I/O or the rate of data replication need to be adjusted, the administrator should be able to apply the necessary controls.

Finally, the ideal migration solution will support fault-tolerant data replication so that if there is an interruption to the migration task, rather than starting over from the beginning, data replication can resume from the point where it was interrupted.


Among the most daunting barriers to cloud adoption are the difficulty, uncertainty and disruption of managing data for live migration to the cloud. First, and foremost, cloud hosting providers need to give their customers efficient cloud migration capabilities with no interruption to runtime operations. To achieve this, hosting providers and their customers should mutually agree upon project parameters and requirements, including maintenance windows, service levels, and success criteria. New tools and technologies have made live migration of VMs to the cloud considerably less disruptive and more efficient. With the right plan, people and tools, moving to the cloud shouldn’t keep you awake at night. Sleep well.

About the Author

Rich Petersen is co-founder and president of JetStream Software and has more than 20 years of experience in enterprise technology. Previously vice president of product management at FlashSoft, which was acquired by SanDisk in 2012, Petersen served at LogicBlaze/Red Hat, Infravio and Interwoven. He earned his MBA at the Haas School of Business at the University of California Berkeley. For more information, please visit JetStream Software on LinkedIn and @JetStreamSoft.

Rate this Article