Generative AI: The next S-curve for the semiconductor industry?

| Artigo

As generative AI (gen AI) applications such as ChatGPT and Sora take the world by storm, demand for computational power is skyrocketing. The semiconductor industry finds itself approaching a new S-curve—and the pressing question for executives is whether the industry will be able to keep up.

Leaders are responding by committing substantial capital expenditures to expand data centers and semiconductor fabrication plants (fabs) while concurrently exploring advancements in chip design, materials, and architectures to meet the evolving needs of the gen AI–driven business landscape.

To guide semiconductor leaders through this transformative phase, we have developed several scenarios for gen AI’s effect in the B2B and B2C markets. Every scenario involves a massive increase in compute—and thus wafer—demand. These scenarios focus on the data centers while acknowledging that implications for edge devices such as smartphones exist but on a much smaller scale.

The demand scenarios, developed from McKinsey analysis, are based on the wafer output that the semiconductor industry could potentially deliver, given constraints such as capital and equipment. While even scenarios that are more ambitious are plausible, the implications for the required number of fabs and the energy supply necessary for the data centers will make them unlikely.

This article will discuss the estimated wafer demand of high-performance components, including logic, memory, data storage chips, and the corresponding number of fabs needed to supply them. Equipped with this information, industry stakeholders can strategically plan and allocate resources to address the burgeoning demand for compute power, ensuring the scalability and sustainability of their operations in the years to come.

Components of gen AI compute demand

The surge in demand for AI and gen AI applications comes with a proportional increase in compute demand. However, it is essential for semiconductor leaders to understand the origins of this demand and how gen AI will be applied. We expect to see two different types of applications for gen AI: B2C and B2B use cases. Within both the B2C and B2B markets, the demand for gen AI can be categorized into two main phases: training and inference. Training runs usually require a substantial amount of data and are compute-intensive. Conversely, inference usually requires much lower compute for each run of a use case.

To empower semiconductor leaders to navigate the intricacies and demands of these markets, we outline six use case archetypes for B2B compute demand and their corresponding compute cost to serve and concurrent level of gen AI value creation.

Six B2B use case archetypes for gen AI application and workload

McKinsey analysis estimates that B2C applications will account for about 70 percent of gen AI compute demand because they include the workload from basic consumer interactions (for example, drafting emails) and advanced user interactions (for example, creating visuals from text). B2B use cases are expected to make up the other approximately 30 percent of the demand. These include use cases such as advanced content creation for businesses (for example, gen AI–assisted code creation), addressing customer inquiries, or generating standard financial reporting.

B2B applications across industry verticals and functions fall into one of six use case archetypes:

  • coding and software development apps that interpret and generate code
  • creative content–generation apps that write documents and communication (for example, to generate marketing material)
  • customer engagement apps that cover automated customer service for outreach, inquiry, and data collection (for example, addressing customer inquiries via a chatbot)
  • innovation apps that generate product and materials for R&D processes (for example, designing a candidate drug molecule)
  • simple concision apps that summarize and extract insights using structured data sets (for example, to generate standard financial reports)
  • complex concision apps that summarize and extract insights using an unstructured or large data set (for example, to synthesize findings in clinical images such as MRI or CT scans)

McKinsey has organized these six diverse and complex B2B use cases according to their compute cost to serve and concurrent gen AI value creation (Exhibit 1). By defining the cost to serve and value creation, decision makers can more adeptly navigate the specifics of B2B use cases and make well-informed choices when adopting them. At its core, the analysis of compute cost to serve comprises training, fine-tuning, and inferencing costs. The analysis also encompasses a hyperscaler’s infrastructure as a service (IaaS) margin, which includes compute hardware, server components, IT infrastructure, power consumption, and estimated talent costs. Gen AI value creation is gauged through metrics such as productivity improvement and labor cost savings.

1

Gen AI demand scenarios

As organizations navigate the complexities of adopting gen AI, strategic utilization of these archetypes becomes imperative. Factors such as the economics of gen AI adoption, algorithm efficiency, and continual hardware advancements at both component and system levels further influence adoption of gen AI and technological progress. Three demand scenarios—base, conservative, and accelerated—represent the possible outcomes of gen AI demand for B2B and B2C applications. The base scenario is informed by a set of required assumptions, such as consistent technological advancements and rapid adoption, supported by business models that cover the capital and operating costs of gen AI training and inference. The conservative and accelerated adoption scenarios represent adoption upside and downside, respectively.

McKinsey analysis estimates that by 2030 in the base scenario, the total gen AI compute demand could reach 25x1030 FLOPs (floating point operations), with approximately 70 percent from B2C applications and 30 percent from B2B applications (Exhibit 2).

2

B2C compute demand scenarios

B2C compute demand is driven by the number of consumers who engage with gen AI, their level of engagement, and its compute implication. Specifically, B2C inference workloads are determined by the number of gen AI interactions per user, the number of gen AI users, and FLOPs per basic and advanced user interaction. Training workloads are determined by the number of training runs per year, the number of gen AI model providers, and FLOPs per training run by different gen AI models (for example, a state-of-the-art model such as GPT-4 in 2023 and smaller or prior generations of models). For all scenarios, it is essential that companies can develop a sustainable business model.

For all scenarios, it is essential that companies can develop a sustainable business model.

Base adoption. By 2030, the expected average number of daily interactions per smartphone user (with one interaction being a series of prompts) is ten for basic consumer applications, such as creating an email draft. The other expected average number is for advanced consumer applications, such as creating longer texts or synthesizing complex input documents. By using current numbers from online and application-based search queries, McKinsey analysis estimates the number of interactions to be approximately twice the forecast daily number of online search queries (approximately 28 billion) in 2030. The underlying assumptions that will enable the base B2C scenario are steady technological advancements, favorable regulatory developments, and continuously growing user acceptance.

Conservative adoption. This scenario could involve cautious adoption from consumers due to ongoing concerns related to data privacy, regulatory developments, and only incremental improvements in the technology, which would lead to half the number of interactions of the base case.

Accelerated adoption. This scenario suggests a high degree of trust in the technology and widespread user acceptance. Drivers for this scenario could be attractive new business models, substantial technological advancements, and compelling user experiences. These drivers could lead to a higher adoption rate (150 percent) of the number of interactions for consumer applications in the base case.

B2B demand scenarios

The adoption of gen AI use cases in the B2B sector is significantly influenced by the sufficiency and cost of semiconductor chip supply. Enterprises must be capable of rationalizing their investment in compute infrastructure, ensuring that the cost of service is lower than the company’s willingness to pay. For these B2B demand scenarios, McKinsey analysis assumes that the willingness to pay corresponds to approximately 20 percent of the total value creation.

In the context of B2B use cases, McKinsey analysis indicates that of the six use case archetypes, only five are economically viable for a broad adoption (Exhibit 3). The sixth archetype, complex concision, is not expected to be adopted broadly due to limited value creation compared to its cost through administrative labor cost savings, coupled with a significant consumption of compute power in analyzing complex and unstructured data inputs.

3

Base adoption. The base scenario assumes a midpoint adoption rate spanning eight to 28 years, indicating that B2B use cases achieve 90 percent adoption in 18 years.1 Furthermore, McKinsey analysis assumes that businesses will realize value beginning in 2024. Securing investments for manufacturing capacity, manufacturing wafers, provisioning compute capacity, and training people to use new services all take time. As such, we assume a lead time of approximately two years in the manufacturing of wafers before value can be captured. This business realization is expected to produce approximately 25 percent of value captured by 2030 for the economically viable use cases. In this scenario, we assume the additional value from all small-scale improvements in labor productivity follow the same overall ratio as the calculated value potential from the six use case archetypes.

Conservative adoption. This scenario assumes an approximately 90 percent adoption rate over 28 years, yielding only approximately 15 percent in value capture by 2030. This deceleration could be attributed to a confluence of factors, including—but not limited to—regulatory constraints, data privacy concerns, and data processing challenges.

Accelerated adoption. This scenario assumes an approximately 90 percent adoption rate in about 13 years. This acceleration is contingent upon catalysts such as attractive business models, rapid technological advancement, or favorable regulations. For example, disruptive hardware architectures will substantially reduce the cost to serve. Additionally, enhancements to the process of software validation may significantly boost the efficiency of gen AI solutions. Factors such as these may expedite the adoption curve and cause a notable uptick in gen AI implementation in the semiconductor industry by 2030.

The adoption of gen AI use cases in the B2B sector is significantly influenced by the sufficiency and cost of semiconductor chip supply.

Gen AI data center infrastructure and hardware trends

Along with considering scenarios for gen AI compute demand, semiconductor leaders will need to adapt to changes in underlying hardware and infrastructure, mainly to data center infrastructure, servers, and semiconductor chips.

Data center infrastructure

Gen AI applications typically run on dedicated servers and in data centers. At first glance, AI data centers might look similar to traditional data centers, but there are considerable differences (see sidebar “Components of an AI server”).

Rack densities—that is, the power consumed by a cabinet of servers—demonstrate the biggest difference between traditional and AI data centers. General-purpose data centers have rack power densities of five to 15 kilowatts (kW), whereas AI training workloads can consume 100 kW—or, in some cases today, up to 150 kW. This number is expected to increase, with some experts estimating power densities of up to 250 kW or even 300 kW in the next few years.2

Additionally, as rack power density rises, rack cooling will switch from air-based cooling to liquid cooling. Direct-to-chip liquid cooling and full-immersion cooling will also require new server and rack designs to accommodate for additional weights.

Servers

In response to the increasing demand for computational power, servers will employ high-performance graphics processing units (GPUs) or specialized AI chips, such as application-specific integrated circuits (ASICs), to efficiently handle gen AI workloads through parallel processing. Today, infrastructure for gen AI training and inference is expected to bifurcate as inference’s compute demand becomes more specific to the use case and requires much lower cost to be economical.

Training. Training server architecture is expected to be similar to today’s high-performance cluster architectures in which all servers in a data center are connected to high-bandwidth, low-latency connectivity. The prevailing high-performance gen AI server architecture uses two central processing units (CPUs) and eight GPUs for compute. In 2030, most training workloads are expected to be executed using this type of CPU+GPU combination. A transition to system-in-a-package design for GPUs and AI accelerators is also expected, with both architectures expected to coexist.

Inference. Current inference workloads run on infrastructure that is similar to the training workload. As gen AI consumer and business adoption increases, the workload is expected to shift to mostly inference, which favors specialized hardware due to lower cost, higher energy efficiency, and faster or better performance for highly specialized tasks.

In 2030, we expect more inference-specific AI servers using a combination of CPUs and several purpose-built AI accelerators that use ASICs.

Gen AI wafer demand on the semiconductor industry

McKinsey analysis estimates the wafer demand of high-performance components based on compute demand and its hardware requirement: logic chips (CPUs, GPUs, and AI accelerators), memory chips (high-bandwidth memory [HBM] and double data rate memory [DDR]), data storage chips (NAND [“not-and”] chips), power semiconductor chips, optical transceivers, and other components. In the following sections, we will look more closely at logic, HBM, DDR, and NAND chips. Beyond logic and memory, we anticipate that there will be an increase in demand for other device types. For instance, power semiconductors will be in higher demand because gen AI servers consume higher amounts of energy. Another consideration is optical components, such as those used in communications, which are expected to transition to optical technologies over time. We have already seen this transition for long-distance networking and backbones that reduce energy consumption while increasing data transmission rates. To spur innovation in almost all areas of the industry, it is necessary to combine these new requirements with the high level of investment anticipated (see sidebar “Pursuing innovation in semiconductors to capture generative AI value”).

In 2030, AI accelerators with ASIC chips are expected to serve the majority of workloads because they perform optimally in specific AI tasks.

Logic chips

Logic chip demand depends on the type of gen AI compute chip and type of server for training and inference workloads. As discussed earlier, by 2030, we anticipate the majority of gen AI compute demand in FLOPs to come from inference workloads. Currently, there are three types of AI servers that can manage inference and training workloads: CPU+GPU, CPU+AI accelerator, and fusion CPU+GPU. Today, CPU+GPU has the best availability and is used for inference and training workloads. In 2030, AI accelerators with ASIC chips are expected to serve the majority of workloads because they perform optimally in specific AI tasks. On the other hand, GPU and fusion servers are ideal for handling training workloads due to their versatility in accommodating various types of tasks (Exhibit 4).

4

In 2030, McKinsey estimates that the logic wafer demand from non–gen AI applications will be approximately 15 million wafers. About seven million of these wafers will be produced using technology nodes of more than three nanometers, and approximately eight million wafers will be produced using nodes equal to or less than three nanometers. Gen AI demand would require an additional 1.2 million to 3.6 million wafers produced using technology nodes equal to or less than three nanometers. Based on current logic fab planning,3 it is anticipated that 15 million wafers using technology nodes equal to or less than seven nanometers can be produced in 2030. Thus, gen AI demand creates a potential supply gap of one million to about four million wafers using technology nodes equal to or less than three nanometers. To close the gap, three to nine new logic fabs will be needed by 2030 (Exhibit 5).

5

DDR and HBM

Gen AI servers use two types of DRAM: HBM, attached to the GPU or AI accelerators, and DDR RAM, attached to the CPU. HBM has higher bandwidth but requires more silicon for the same amount of data.

As transformer models grow larger, gen AI servers have been expanding memory capacity. However, the growth in memory capacity is not straightforward, posing challenges to hardware and software design. First, the industry faces a memory wall problem, in which memory capacity and bandwidth are the bottleneck for system-level compute performance. How the industry will tackle the memory wall problem is an open question. Static random-access memory (SRAM) is tested in various chips to increase the near-compute memory, but its high cost limits wide adoption. For example, future algorithms may require less memory per inference run, slowing down total memory demand growth. Second, AI accelerators are lighter in memory compared to CPU+GPU architecture and may become more popular by 2030 when inference workloads flourish. This could mean a potentially slower growth in memory demand.

As transformer models grow larger, gen AI servers have been expanding memory capacity. However, the growth in memory capacity is not straightforward, posing challenges to hardware and software design.

Given these uncertainties, we consider two DRAM demand scenarios in addition to the base, conservative, and accelerated adoption scenarios: a “DRAM light” scenario, in which AI accelerators remain memory-light compared to GPU-based systems, and a “DRAM base” scenario, in which AI accelerator–based systems catch up to GPU-based systems in terms of DRAM demand.

By 2030, we expect DRAM demand from gen AI applications to be five to 13 million wafers in the DRAM light scenario, translating to four to 12 dedicated fabs. In the DRAM base scenario, DRAM demand would be seven to 21 million wafers, translating to six to 18 fabs. The wide range of values reflects the challenges associated with reducing the memory requirements per device.

NAND memory

NAND memory is used for data storage—for instance, for the operating system, user data, and input and output. In 2030, NAND demand will likely be driven by dedicated data servers for video and multimodel data. This data will require substantial storage (for example, for training on high-resolution video sequences and retrieving data during inference). We expect the total NAND demand to be two to eight million wafers, corresponding to one to five fabs. Given that the performance requirement for NAND storage of gen AI will be the same as in current servers, fulfilling this demand will be less challenging compared to logic and DRAM.

Other components

The rising compute demand will create additional demand for many other chip types. Two types are particularly noteworthy:

High-speed network and interconnect. Gen AI requires high-bandwidth and low-latency connectivity between the servers and between the various components of the servers. A larger amount of network interfaces and switches are required to create all the connections. Today, these interlinks are mostly copper-based, but optical connectivity is expected to gain share with rising bandwidth and latency requirements.

Power semiconductors. AI servers need a large amount of electricity and might consume more than 10 percent of global electricity in 2030. This requires many power semiconductors within the server and on the actual devices.


The surge in demand for gen AI applications is propelling a corresponding need for computational power, driving both software innovation and substantial investment in data center infrastructure and semiconductor fabs. However, the critical question for industry leaders is whether the semiconductor sector will be able to meet the demand. To meet this challenge, semiconductor leaders should consider which scenario they believe in. Investment in semiconductor manufacturing capacity and servers is costly and takes time, so careful evaluation of the landscape is essential to navigating the complexities of the gen AI revolution and developing a view of its impact on the semiconductor industry.

Explore a career with us