April 21, 2022 | Article
By Aaron Bawcom, Sebastian Becerra, Beau Bennett, and Bill Gregg
Cloud migrations flounder quickly unless organizations invest in building the right cloud foundations.
For many companies moving to the cloud, a focus on short-term gain leads to long-term pain and prevents them from capturing a much bigger portion of cloud’s estimated $1 trillion of value potential. It happens when IT departments, usually with the assistance of systems integrators, migrate a set of applications to the cloud as quickly as possible to capture initial gains. That short-term focus frequently has significant consequences.
A cloud foundation is a set of design decisions that is implemented in code files and defines how the cloud is used, secured, and operated. We have found that the ideal cloud foundation is split into three layers to reduce risk, accelerate change, and provide appropriate levels of isolation (exhibit):
- Application patterns: Code artifacts that automate the secure, compliant, and standardized configuration and deployment of applications with similar functional and nonfunctional requirements through the use of infrastructure as code (IaC), pipeline as code (PiaC), policy as code (PaC), security as code (SaC), and compliance as code (CaC):
- Policy as code: The translation of an organization’s standards and policies into actual executable code that secures the infrastructure and environment of the organization automatically in accordance with the policy
- Security as code:Software that verifies the configuration of an infrastructure definition before deployment and after deployment to meet a particular defined standard (for more, see “Security as code: The best (and maybe only) path to securing cloud applications and systems”)
- Compliance as code: A composed set of rules interpreted by a software-based policy engine that enforces compliance policy for a specific cloud environment
- Isolation zones: A set of separate CSP-specific zones (sometimes called landing zones) that isolate application environments to prevent concentration risk. Each zone contains CSP services, identity and access management (IAM), network isolation, capacity management, shared services scoped to the isolation zone, and change control where one or more related applications run
- Base: A set of CSP-agnostic capabilities that are provided to a set of isolation zones, including network connectivity and routing; centralized firewall and proxy capabilities; identity standardization; enterprise logging, monitoring, and analytics (ELMA); shared enterprise services; golden-image (or primary-image) pipelines; and compliance enforcement
The culprit? Lack of attention to the cloud foundation, that unsexy but critical structural underpinning that determines the success of a company’s entire cloud strategy (see sidebar, “What is a cloud foundation?”). Several large banks are paying that price, resulting in the need to hire hundreds of cloud engineers because they did not put the right foundational architecture in place at the beginning.
Building a solid cloud foundation as part of a transformation enginedoes not mean delaying financial returns or investing significant resources. It just requires knowing what critical steps to take and executing them well. Our experience shows that companies that put in place a solid cloud foundation reap benefits in the form of a potential eightfold acceleration in the pace of cloud migration and adoption and a 50 percent reduction in migration costs over the long term—without delaying their cloud program.
A top consumer-packaged-goods (CPG) company was running into significant delays with its cloud-migration program—each application was taking up to two months to migrate. With portions of the business divesting, finance and legal were pressuring the company to isolate them from the company quickly. Realizing that deficiencies in its cloud foundation were causing the delays, it made the counterintuitive decision to pause the migration to focus on strengthening the cloud foundation.
For example, it automated critical infrastructure capabilities, deployed security software to automate compliance, deployed reusable application patterns, and created isolation zones to insulate workloads from one another and prevent potential problems in one zone from spreading. Once these improvements were in place, the company was able to migrate applications quickly, safely, and securely, with single applications taking days rather than weeks.
Ten actions to get your cloud foundation right
Building a strong cloud foundation is not the “cost of doing business.” It’s a critical investment that will reap significant rewards in terms of speed and value. The following ten actions are the most important in building this foundation.
1. Optimize technology to enable the fastest ‘idea-to-live’ process
Whether their workloads are in the cloud or in traditional data centers, many companies have outdated and bureaucratic work methods that introduce delays and frustrations. Your cloud foundation should be constructed to enable an idea’s rapid progression from inception to up and running in a production environment, without sacrificing safety and security.
In practice, that means automating as many steps of the production journey as possible, including sandbox requests, firewall changes, on-demand creation of large numbers of isolated networks, identity and access management (IAM), application registration, certificate generation, compliance, and so on. Automating these steps is as valuable in traditional data centers as it is in the cloud. But since the cloud offers unique tools that make automation easier, and because the move to cloud leads organizations to rethink their entire strategy, the beginning of the migration process is often the right time to change how IT operates.
2. Design the cloud architecture so it can scale
If companies do it right, they can build a cloud architecture based on five people that can scale up to support 500 or more without significant changes. As the cloud footprint grows, a well-designed architecture should be able to accommodate more components, including more application patterns, isolation zones, and capabilities. Support for this scaling requires simple, well-designed interfaces between components. Because this is difficult to get right the first time, cloud-architecture engineers who have done it before at scale are a big advantage.
3. Build an organization that mirrors the architecture
According to Conway’s law, the way teams are organized will determine the shape of the technology they develop. IT organizations have a set structure for teams, and that can lead them to build things that don’t fit the shape of the cloud architecture.
For example, some companies have a separate cloud team for each of its business units. This can lead to each team building different cloud capabilities for its respective business unit and not architecting them for reuse by other business units. That can create slowdowns and even delays when changes made by one team affect the usage of another.
IT needs to design its cloud architecture first and then build an organization based on that structure. That means building out an organization that has a base team, isolation-zone teams, and application-pattern teams in order to reduce dependencies and redundancies between groups and ultimately deliver well-architected components at a lower cost.
4. Use the cloud that already exists
Many companies operate in fear of being locked into a specific cloud service provider (CSP), so they look for ways to mitigate that risk. A common pattern is an overreliance on containers, which can be expensive and time consuming and keep businesses from realizing the genuine benefits available from CSPs. One example of this was a company that created a container platform in the cloud as opposed to using the cloud’s own resiliency tools. When there was an outage, the impact was so large that it took multiple days to get its systems back online because the fault was embedded in the core of its non-cloud tooling.
There are other ways to mitigate against CSP lock-in, such as defining a limited lock-in time frame and putting practices and systems in place that enable a rapid shift, if necessary. By attempting to build non-native resiliency capabilities, companies are essentially competing with CSPs without having their experience, expertise, or resources. The root of this issue is that companies still tend to treat CSPs as if they were hardware vendors rather than software partners.
5. Offer cloud products, not cloud services
It is common for companies to create internal cloud-service teams to help IT and the business use the cloud. Usually these service teams operate like fulfillment centers, responding to requests for access to approved cloud services. The business ends up using dozens of cloud services independently and without a coherent architecture, resulting in complexity, defects, and poor transparency into usage.
Instead, companies need dedicated product teams staffed with experienced cloud architects and engineers to create and manage simple, scalable, and reusable cloud products for application teams. The constraints imposed by aligning around cloud products can help to ensure that the business uses the correct capabilities in the correct way.
Once the product team has an inventory of cloud products, it can encourage application teams to use them to fast-track their cloud migration. The aptitude and interest of each application team, however, will influence how quickly and easily it adopts the new cloud products. Teams with little cloud experience, skill, or interest will need step-by-step assistance, while others will be able to move quickly with little guidance. The product team, therefore, needs to have an operating model that can support varying levels of application-team involvement in the cloud-migration journey.
One effective route offers three levels of engagement (exhibit):
- Concierge level: The engagement team builds everything needed by an application team.
- Embedded level: Architects from the central cloud team are embedded into application teams to help them build the right application patterns.
- Partner level: A partner team builds and runs its own isolation zone using the core capabilities from the base foundation, such as networking, logging, and identity.
By establishing the cloud products, the teams to support them, and the model by which application teams can engage with product teams, the business has the mechanisms in place to thoughtfully scale its cloud strategy.
6. Application teams should not reinvent how to design and deploy applications in cloud
When organizations give free rein to application teams to migrate applications to the cloud provider, the result is a menagerie of disparate cloud capabilities and configurations that makes ongoing maintenance of the entire inventory difficult.
Instead, organizations should treat the deployment capabilities of an application as a stand-alone product, solving common problems once using application patterns. Application patterns can be responsible for configuring shared resources, standardizing deployment pipelines, and ensuring quality and security compliance. The number of patterns needed to support the inventory of applications can be small, therefore maximizing ROI. For example, one large bank successfully used just ten application patterns to satisfy 95 percent of its necessary use cases.
Cloud by McKinsey Insights
7. Provide targeted change management by using isolation zones
Isolation zones are cloud environments where applications live. In an effort to accelerate cloud migration, CSPs and systems integrators usually start with a single isolation zone to host all applications. That’s a high-risk approach, because configuration changes to support one application can unintentionally affect others. Going to the other extreme—one isolation zone for each application—prevents the efficient deployment of configuration changes, requiring the same work to be carried out across many isolation zones.
As a rule of thumb, a company should have from five to 100 isolation zones, depending on the size of the business and how it answers the following questions:
- Does the application face the internet?
- What level of resiliency is required?
- What is the risk-assurance level or security posture required for applications running in the zone?
- Which business unit has decision rights on how the zone is changed for legal purposes?
8. Build base capabilities once to use across every CSP
Most companies will be on multiple clouds. The mix often breaks down to about 60 percent of workloads in one, 30 percent in another, and the rest in a third. Rather than building the same base capabilities (for example, network connectivity and routing, identity services, logging, and monitoring) across all the CSPs, companies should build them once and reuse the capabilities across all isolation zones, even those that reside in a different CSP from the base.
9. Speed integration of acquisitions by putting in place another instance of the base foundation
During an acquisition, merging IT assets is difficult and time consuming. The cloud can speed the merger process and ease its complexity if the acquiring company creates an “integration-base foundation” that can run the assets of the company being acquired. This enables the IAM, security, network, and compliance policies already in place at the acquired company to continue, allowing its existing workloads to continue to function as designed. Over time, those workloads can be migrated from the integration base to the main base at a measured and predictable pace.
Using this approach, companies can efficiently operate their core cloud estate as well as the acquisition’s using the same software with a different configuration. This typically can reduce integration time from two to three years to closer to three to nine months.
10. Make preventative and automated cloud security and compliance the cornerstone
All software components and systems must go through a security layer. Traditional cybersecurity mechanisms are dependent on human oversight and review, which cannot match the tempo required to capture the cloud’s full benefits of agility and speed. For this reason, companies must adopt new security architectures and processes to protect their cloud workloads.
Security as code (SaC) has been the most effective approach to securing cloud workloads with speed and agility. The SaC approach defines cybersecurity policies and standards programmatically so they can be referenced automatically in the configuration scripts used to provision cloud systems. Systems running in the cloud can be evaluated against security policies to prevent changes that move the system out of compliance.
“Start small and grow” is a viable cloud strategy only if the fundamental building blocks are created from the start. Companies need to design and build their cloud foundation to provide a reusable, scalable platform that supports all the IT workloads destined for the cloud. This approach unlocks the benefits that the cloud offers and ultimately captures its full value.
ABOUT THE AUTHOR(S)
Aaron Bawcom is a distinguished cloud architect in McKinsey’s Atlanta office, where Sebastian Becerra, Beau Bennett, and Bill Gregg are principal cloud architects.