Key Takeaways
- The public cloud industry is evolving. First, there is more competitiveness and technology opportunities across cloud solutions. Second, there are new challenges businesses face on data governance with territorial regulations and security controls.
- You need to make sure you have a clear data governance and cloud strategy in place to drive good engineering practices that won’t impede your business.
- Your stakeholders' contribution is key to your success. They need to embrace your strategy, understand risks, and commitments.
- You don't know what the future requirements will be; have enough flexibility built in that will allow you to expand without too much pain.
- Automate everything, including your hardware deployment. Leverage your vendors, challenge them, and put them to work.
Amazon Web Services was the clear leader in the 2017 Gartner's Magic Quadrant for Cloud Infrastructure as a Service, Worldwide. However, that is not enough to make it the de-facto cloud strategy for every business. Target recently announced that it’s moving, at least partially, out of AWS due to the competition with the Amazon retails offering. Walmart built its own private cloud solution, and recently asked its technical vendors to get out of AWS, while forming closer deals with Google or Microsoft. Add to this picture that all major cloud providers joined the Cloud Native Computing Foundation (CNCF) pushing for Kubernetes and driving the industry towards a container based microservice world across public and private cloud. VMware is striking deals with all four major public cloud providers (Amazon, Microsoft, IBM, Google). Microsoft launched the Azure Stack, an on-premises light version of Azure. Another important industry trend is the evolution of territorial regulations (China, Russia, Europe) around data privacy and security. For example, note the implementation of the European General Data Protection Regulation (GDPR) in May 2018. It’s becoming critical for businesses to refine their cloud strategy to truly embrace multi-cloud solutions and consider the need for stronger data governance. Often, this will mean having a foot in supporting a private cloud solution.
Below are five things to consider when implementing a private cloud strategy.
Magic Quadrant for Cloud Infrastructure as a Service, Worldwide
Source: Gartner (June 2017)
1. Have a vision and build a strategic and tactical plan
Many private cloud deployments fail to provide meaningful results. Like every engineering project, setting wrong expectations and unrealistic goals will lead to poor outcome. It doesn’t have to be that way. Once you have a clear understanding of what problems you need to solve for stakeholders, you must define clear goals and requirements. For example, look at the existing pain points for your developers and how your private cloud solution will solve or mitigate those problems. Improving the developers experience ensure faster adoption and long-term success.
Making the move to a private cloud require focus, perseverance, motivation, accountability, and strong communication. You must have a good understanding of your existing service costs by doing a thorough Total Cost of Ownership analysis. What does the day to day operations look like to support private infrastructure? Do you need to define a charge back model for your stakeholders and where are successful examples of that? What types of workloads do you plan to run and how you can simplify capacity planning? What will be your minimum footprint and even maximum footprint? Will your solution integrate smoothly with your existing CI/CD pipeline and developer workflow? Are you already providing containerized environments to your engineering stakeholders or do you need to plan for container adoption in production in your multi-cloud environment? If any component need to be redesigned or reengineered, you need to consider the efforts needed. Your application deployment and abstractions may need to be changed to provide a seamless experience across your cloud environments and provide a positive experience to your engineering teams. You must define your top level SLAs and how you will monitor it as KPI for your stakeholders. Once you have defined your strategy, get more tactical and refine your plan.
Keep in mind that with too much planned, nothing will get done. However, without ambitious goals, you won’t build the necessary momentum. A fine balance between necessary feature set and the reality of the technology stack is necessary to deliver a business value.
Anecdotes
#1 The Adobe Advertising Cloud DSP built its multi-cloud solution through the lens of the TubeMogul fast growth. The Vision? Simplify problem solving of low latency and large storage requirement by providing our business stakeholders the ability to serve its core workload thru a fully automated on-premises infrastructure. The Strategy? Deliver performance and provisioning flexibility through a mix of bare-metal and virtualized instances with a simple CI/CD workflow that works in our multi-cloud environments. The Tactic? Leverage the open source project OpenStack for infrastructure orchestration and automation. Built a, lean, dedicated team who can provide R&D and maintain the private cloud lifecycle. Ensure consistent CI/CD workflow for developers across private and public cloud environments.
#2 While mulling our plans over a private cloud, we challenged our findings directly with our public cloud vendor (AWS). We questioned everything from our TCO analysis to the technical challenges we were facing. At any time during the first three years we were ready to pull back of our private cloud plans, even after a first location deployed. Though, for a long time, our private cloud plan has been our BATNA [1] to negotiate down pricing with our public cloud provider.
2. Design with flexibility in mind
Once you know what services you need to deliver and what would be the operational model, it is critical to keep enough flexibility in your design. The R&D phase won’t be a negligible part of your efforts. You’ll have multiple iterations. Plan accordingly with room for the unknown or complex failures. Chasing a technology always lead to passionate debates in engineering. Then, think about how to define your network and servers' specs. Don't make the mistake of building a kennel for pets, you want a farm with large cattle. In a private cloud model, the technology you pick will be around for few years, you will need to stand by it, be an advocate, demonstrate added value, build a community, support it, and explain to grumpy architects why it is the way it is. Bake in your plan what would be your upgrade path and how you could get your infrastructure from a v1 to a v2. Technology evolution will be key to support new requirements, follow new trends, and retain your talents.
First, deliver a solid MVP that bring critical business value which will define your success. Performance improvement with bells and whistles come second. Make sure to be able to leverage your bare-metal infrastructure as much as your virtualized or containerized layer. Don’t make your private cloud solution an “IT project” only. Don’t aim at building a new public cloud internally. You won’t succeed! You need to design your solution with enough flexibility to support your developers stack in a meaningful way. You will need to build and develop new solutions, APIs, and services to provide a seamless experience to your engineering stakeholders. Ensure your private cloud services follows your existing cloud standards and conventions to help drive developers’ adoption and allow functionalities to be reused across cloud environments. Depending on your workload projections and use cases, you may have to design for a SDN and develop some service overlays, or may be able to keep it simple.
It is important to try to smooth the learning curve and be strongly agile. Start small, standardize developers’ workflows, leverage VLANs, limit your deployment to core services (i.e. identity management, networking, compute, storage), and have a clear path for upgrades. Keep advanced services for later once you demonstrate viability of core services.
Anecdotes
#1 At TubeMogul, we went thru multiple cycle of trial and error to pick our technologies or vet vendors. Some of those technologies barely exist anymore (CloudStack, Eucalyptus, etc.). We ended-up settling for OpenStack with a mix of bare-metal. Some key foundation of our first design assumed cheap but powerful commodity hardware with simple network and failure design. We leveraged only the core services from OpenStack and a basic CI/CD workflow with Jenkins and PXE for bare-metal provisioning. The same CI/CD pipeline is used by developers to manage application canary and releases in production across public and private cloud environments, or while doing cloud bursting from one environment to another. This required standard naming convention and alignment between environments to allow us to reuse existing tooling and services in a seamless way which was critical to build some momentum.
3. Rack and Roll: Infrastructure Automation
A critical part of a private cloud deployment is how you would handle the data center, network, and procurement part. This include the asset management and RMA process with all the life cycle of your assets. This can easily become a critical pain point that would slowdown all your operations and can threaten the business value of your deployment. Think it through, find what you are good at and what you are not good at. Depending on your investment, team expertise, and goal you may take on more, though don’t underutilize your vendors. As I keep reminding my teams, VAR stand for Value Added Resellers, don’t miss on the value-added otherwise you get ripped off. Depending on your engagement model you may need to define rack elevation, cable matrix, port mapping, power draw, etc. Ultimately, you want to get away of the server-at-a-time model to get to a rack-at-a-time model and deliver in your data center racks that are fully loaded and cabled, i.e. do a Rack-And-Roll. You just plug the rack to your core network and get it running right away. You don’t want to be in the business of assembling, racking, and cabling, assets unless you have professional staff for it and get a true added value from it, most businesses don’t, use a VAR.
As part of this “hardware automation”, ensure to have a design that will fit your data center location. What Top-Of-Rack [2] design are you going for? You probably don’t want to know about TIA 942-A, so leverage your data center vendor to provide insights and design reviews. This may impact hardware choices depending on front or back fans and where is your cold aisle in your data center. A lot of details to think about! Make sure your plan will fit in consideration of your space and power allocation as well as how you can leverage on-site staff to handle your RMA. Those are critical elements to understand, streamline, and control in order to build the foundations of a successful private cloud that can then be fully automated thru code.
Anecdotes
#1 The Adobe Advertising Cloud DSP minimum deployment footprint for any data center is two racks. All racks are built from ground-up by our VAR and delivered in Equinix facilities, ready to go. All the provisioning is then automated to deploy our base image and basic components based on assigned role. Puppet will guarantee our configuration management. If a resource get into an unwanted state or has been RMA and need to be redeployed, it’s just a matter of marking the asset for re-provisioning to trigger a rebuild.
4. Eat your own dog food and be transparent
As you go through your efforts, don’t get fooled. You need to test what you build. You will need to test again and again. You will want to break your deployment, rebuild it, again and again. This is critical to dedicate the right amount of resources to your engineering staff to be able to iterate, take things apart, break things, rinse and repeat. You want a realistic lab for testing and experimenting. You need to use your own private cloud to feel the pain and address critical gaps.
As you get thru the first live fire on your freshly new stack, you will need to make sure to provide visible data point to your stakeholders so they understand the progress. Don’t be afraid to show risks, and the overall state of your cloud. Do you provide clear capacity planning updates on how many compute resources are available? Do your stakeholders understand network limitation of current design and it impacts their application use? How do you provide visibility on your network stack to build trust and confidence? Do you have overloaded compute resources and are you abusing over-subscription rates? All these questions need to be answered to provide trust and support from your stakeholders while iterating and growing your production environment.
Anecdotes
#1 TubeMogul’s first OpenStack dev environment was a success, until a week later while facing our first issues with Ceph and bringing down the full environment. The bad news? It became an active, shared dev environment to experiment the future of the private cloud as well as an active development environment for our engineering stakeholders working on features developments and pre-release validation. Bad week. Lesson learned. Don’t mix your dev with your stakeholders. Once someone depend on your stack, it is your responsibility to deliver quality service.
#2 Capacity planning is always difficult; in a cloud environment, you want to understand your business use case but you don’t want business to depend solely on you to grow. It’s important to come with a contract where you know when to add capacity ahead of time. Our minimum expansion is on the base of two racks. If we are getting short of one asset in one location, we will add two racks to our footprint. This is where our flexible design kick in as it allowed us to quickly enable cloud bursting back to a public cloud as needed.
Conclusion
It is a journey. Building a private cloud is not trivial and most company may not have a real business case for it. Use public cloud wherever it makes sense. To build a private cloud, you need to be very clear on what you want to achieve and how it fits in your multi-cloud strategy. Your data governance or strategic business decisions may direct you one way or another. A private cloud is not a simple engineering project; it must be a strategic decision. Understand the big picture, have the support of your stakeholders, then build an agile plan that allow you to iterate by failing and fixing quickly. The Adobe Advertising Cloud DSP Private Cloud went through multiple phases that required solid software and operations engineering to automate our infrastructure. We now deliver a core infrastructure that allow us to reduce our footprint, reduce our latency, handle more traffic, and recently outperform raw network AWS performances by three times.
References
[1] BATNA is a term coined by Roger Fisher and William Ury in their 1981 bestseller, Getting to Yes: Negotiating Without Giving In.[1] It stands for "Best ALTERNATIVE TO a negotiated agreement."
[2] Cisco Data Center Top-of-Rack Architecture Design
About the Author
Nicolas Brousse, a Cloud Technology Leader, became Director of Operations Engineering at Adobe (NASDAQ: ADBE) after the acquisition of TubeMogul (NASDAQ: TUBE). Adept at adapting quickly to ongoing business needs and constraints, Nicolas leads a global team of site reliability engineers, cloud engineers, security engineers, and database architects that build, manage, and monitor Adobe Advertising Cloud's infrastructure 24/7 and adhere to "DevOps" methodology. Nicolas is a frequent speaker at top U.S. technology conferences and regularly gives advice to other operations engineers. Prior to relocating to the U.S. to join TubeMogul, Nicolas worked in technology for over 15 years, managing heavy traffic and large user databases for companies like MultiMania, Lycos and Kewego.