Key Takeaways
- As your project grows you need to move to larger VMs. But if the next step up exceeds your requirements, you are overpaying
- Containers offer higher granularity than VM's, and can be scaled vertically without rebooting the running instances
- Monolithic and legacy applications can be migrated from VMs to system containers without modifying your settings
- Scaling Java vertically requires proper JVM configurations, and the shrinking garbage collector
- "Pay as you go" vs "Pay as you use" cloud pricing models and the right choice to increase efficiency
Cloud resources can be expensive, especially when you are forced to pay for resources that you don’t need; on the other hand resource shortages cause downtimes. What’s a developer to do? In this article we will discuss techniques for determining the golden medium that lets you pay for just the resources you actually consume, without being limited as your application capacity requirements scale.
Admit That You Overpay for VMs
The first step to any solution of course is admitting that you have a problem. Below are some details on the issue that many cloud users face.
Almost every cloud vendor offers the ability to choose from a range of different VM sizes. Choosing the right VM size can be a daunting task; too small and you can trigger performance issues or even downtimes during load spikes. Over-allocate? Then during normal load or idle periods all unused resources are wasted. Does this scenario look familiar from your own cloud hosted applications?
And when the project starts growing horizontally, the resource inefficiency issue replicates in each instance, and so, the problem grows proportionally.
In addition, if you need to add just a few more resources to the same VM, the only way out with most of current cloud vendors is to double your VM size. See the sample of AWS offering below.
(Click on the image to enlarge it)
Exacerbating the problem, you need to incur downtime when you move, by stopping a current VM, performing all steps of application redeploy or migration, and then dealing with the inevitable associated challenges.
This shows that VMs are not quite flexible and efficient in terms of resource usage, and limits adjustment according to variable loads. Such lack of elasticity directly leads to overpaying.
Find How to Scale up And down Efficiently
If scale out is not helping to use resources efficiently, then we need to look inside our VMs for a deeper understanding of how vertical scaling can be implemented.
Vertical scaling optimizes memory and CPU usage of any instance, according to its current load. If configured properly, this works perfectly for both monoliths, as well as microservices.
Setting up vertical scaling inside a VM by adding or removing resources on the fly without downtimes is a difficult task. VM technologies provide memory ballooning, but it’s not fully automated, requiring tooling for monitoring the memory pressure in the host and guest OS, and then activating up or down scaling as appropriate. But this doesn't work well in practice, as the memory sharing should be automatic in order to be useful.
Container technology unlocks a new level of flexibility thanks to its out-of-box automatic resource sharing among containers on the same host, with a help of cgroups. Resources that are not consumed within the limit boundaries are automatically shared with other containers running on the same hardware node.
And unlike VMs, the resource limits in containers can be easily scaled without reboot of the running instances.
As a result, the resizing of the same container on the fly is easier, cheaper and faster than moving to larger VMs.
Migrate from VMs to Containers
There are two types of containers – application and system containers. An application container (such as Docker or rkt) typically runs in as little as a single process, whereas a system container (LXD, OpenVZ) behaves like a full OS and can run full-featured init systems like systemd, SysVinit, and openrc, that allow processes to spawn other processes like openssh, crond, or syslogd, together inside a single container. Both types support vertical scaling with resource sharing for higher efficiency.
Ideally on new projects you want to design around application containers from the ground up, as it is relatively easy to create the required images using publicly available Docker templates. But there is a common misconception that containers are good only for greenfield applications (microservices and cloud-native). The experience and use cases prove possibility to migrate existing workloads from VMs to containers without rewriting or redesigning applications.
For monolithic and legacy applications it is preferable to use system containers, so that you can reuse architecture, configuration, etc., that were implemented in the original VM design. Use standard network configurations like multicast, run multiple processes inside a container, avoid issues with incorrect memory limits determination, write on the local file system and keep it safe during container restart, troubleshoot issues and analyze logs in an already established way, use a variety of configuration tools based on SSH, and be liberal in relying on other important "old school" tasks.
To migrate from VMs, monolithic application topology should be decomposed into small logical pieces distributed among a set of interconnected containers. A simple representation of the decomposition process is shown in the picture below.
Each application component should be placed inside an isolated container. This approach can simplify the application topology in general, as some specific parts of the project may become unnecessary within a new architecture.
For example, Java EE WebLogic Server consists mainly of three kinds of instances required for running in a VM: administration server, node manager and managed server. After decomposition, we can get rid of the node manager role, which is designed as a VM agent to add/remove managed server instances, as now they will be added automatically by the container and attached directly to administration server using the container orchestration platform and a set of WLST (WebLogic Server Scripting Tool) scripts.
To proceed with migration, you need to prepare the required container images. For system containers, that process might be a bit more complex than for application containers, so either build it yourself or use an orchestrator like Jelastic with pre-configured system container templates.
And finally, deploy the project itself and configure the needed interconnections.
Now each container can be scaled up and down on the fly with no downtime. It is much thinner compared to virtual machines, so this operation takes much less time compared to scaling with VMs. And the horizontal scaling process became very granular and smooth, as a container can be easily provisioned from the scratch or cloned.
Enable Garbage Collector with Memory Shrink
For scaling Java vertically, it is not sufficient to just use containers; you also need to configure the JVM properly. Specifically, the garbage collector you select should provide memory shrinking in runtime.
Such GC packages all the live objects together, removes garbage objects, uncommit and releases unused memory back to the operation system, in contrast to non-shrinking GC or non-optimal JVM start options, where Java applications hold all committed RAM and cannot be scaled vertically according to the application load. Unfortunately, the JDK 8 default Parallel garbage collector (-XX:+UseParallelGC) is not shrinking and does not solve the issue of inefficient RAM usage by JVM. Fortunately, this is easily remedied by switching to Garbage-First (-XX:+UseG1GC).
Let’s see the example below. Even if your application has low RAM utilization (blue in the graph), the unused resources cannot be shared with other processes or other containers as it’s fully allocated to the JVM (orange).
(Click on the image to enlarge it)
However, the good news for the Java ecosystem is that as of JDK 9, the modern shrinking G1 garbage collector is enabled by default. One of its main advantages is the ability to compact free memory space without lengthy GC pause times and uncommit unused heap.
Use the following parameter to enable G1, if you use JDK lower than 9th release:
-XX:+UseG1GC
The following two parameters configure the vertical scaling of memory resources:
- set Xms - a scaling step
- set Xmx - a maximum scaling limit
Also, the application should periodically invoke Full GC, for example, System.gc(), during a low load or idle stage. This process can be implemented inside the application logic or automated with a help of the external Jelastic GC Agent.
In the graph below, we show the result of activating the following JVM start options with delta time growth of about 300 seconds:
-XX:+UseG1GC -Xmx2g -Xms32m
(Click on the image to enlarge it)
This graph illustrates the significant improvement in resource utilization compared to the previous sample. The reserved RAM (orange) increases slowly corresponding to the real usage growth (blue). And all unused resources within the Max Heap limits are available to be consumed by other containers or processes running in the same host, and not wasted by standing idle.
This proves that a combination of container technology and G1 provides the highest efficiency in terms of resource usage for Java applications in the cloud.
Choose a Cloud with Pay-as-You-Use Model
The last (but not least) important step is to choose a cloud provider with a "pay per use" pricing model in order to be charged only based on consumption.
Cloud computing is very often compared to electricity usage, in that it provides resources on demand and offers a "pay as you go" model. But there is a major difference - your electric bill doesn’t double when you use a little more power!
Most of the cloud vendors provide a "pay as you go" billing model, which means that it is possible to start with a smaller machine and then add more servers as the project grows. But as we described above, you cannot simply choose the size that precisely fits your current needs and will scale with you, without some extra manual steps and possible downtimes. So you keep paying for the limits - for a small machine at first, then for one double in size, and ultimately horizontal scaling to several underutilized VMs.
In contrast to that, a "pay as you use" billing approach considers the load on the application instances at a present time, and provides or reclaims any required resources on the fly, which is made possible thanks to container technology. As a result, you are charged based on actual consumption and are not required to make complex reconfigurations to scale up.
(Click on the image to enlarge it)
But what if you are already locked into a vendor with running VMs, and you’re paying for the limits and not ready to change it, then there is still a possible workaround to increase efficiency and save money? You can take a large VM, install a container engine inside and then migrate the workloads from all of the small VMs. In this way, your application will be running inside containers within the VM - a kind of "layer-cake", but it helps to consolidate and compact used resources, as well as to release and share unused ones.
Realizing benefits of vertical scaling helps to quickly eliminate a set of performance issues, avoid unnecessary complexity with rashly implemented horizontal scaling, and decrease cloud spends regardless of application type - monolith or microservice.
About the Author
Ruslan Synytsky is CEO and co-founder of Jelastic, delivering multi-cloud Platform-as-a-Service for developers. He designed the core technology of the platform that runs millions of containers in a wide range of data centers worldwide. Synytsky worked on building highly-available clustered solutions, as well as enhancements of automatic vertical scaling and horizontal scaling methods for legacy and microservice applications in the cloud. Rich in technical and business experience, Synytsky is actively involved in various conferences for developers, hosting providers, integrators and enterprises.