BofA Securities 2024 Global Technology Conference

Jun 5, 2024

Speaker 2

Hope everyone enjoyed their lunch. Welcome back. I lead semiconductor research coverage at Bank of America Securities, and really delighted and privileged to have Ian Buck, Vice President of NVIDIA's HPC and Hyperscale, the PhD from Stanford. When many of us were enjoying our spring break, Ian and his team were working on Brook, which is the precursor to CUDA, which I think is kind of NVIDIA sells. So really delighted to have Ian with us. What I thought I would do is lead off with some of my questions, but if there's anything that you feel is important to the discussion, please feel free to. Welcome to you, Ian. Really delighted that you could be with us.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Thank you. Look forward to your questions.

Speaker 2

Okay. So Ian, maybe let's to start it off, let's talk about Computex. What do you find the most interesting and exciting as you look at growth prospects over the next few years?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, Computex is an important conference for NVIDIA and for AI now. The world's systems, they get their machines, their hardware from this small island of Taiwan, of course, the chips as well. So we're there, a very important ecosystem for us. A year ago, we the a system standard for deploying and building GPU systems of a variety of shapes and different sizes of workloads. And, that, that opportunity where now the standard, you know, CPU, motherboard, but now, you know, how many GPUs do you need? How what's their configuration? What's their thermal profile and where they want to fit and what workload they want to run on, has diverse ecosystem.

So it's been, it's been really fun to watch that explode and, and the number of, companies that are to be able to take advantage of it. We talked, of course, about Blackwell and, and what it will do. Our next talk, as well about our roadmap, what we're doing today with the Hopper platform. It's our current architecture. What we'll be deploying with Blackwell and our Blackwell platform, including upgrades to Blackwell. Probably talk about for the first time, what's after Blackwell, Rubin platform, which will come with a new CPU and GPU. So a lot of interesting, exciting, from a infrastructure and hardware standpoint. On the software option of all sorts of different models, and we can talk more about it.

One way we're helping is by packaging up a lot of those models, whether it be Llama, Mistral, Gemma, into a container. They know they're getting the best performance, the best inference capabilities, and a nicely packaged container that they can then tailor and deploy anywhere. These are what we call NIMS, which are the NVIDIA inference, and we're helping educating and making it available to all the enterprises. So it's been a very exciting Computex, and if you're ever good to go, it's quite available.

Speaker 2

Excellent. So let's start looking at this from the end market. Right, you work very closely with all the hyperscalers. From the outside, when we look at this market, you know, we see the accelerator market was, like, over $40 billion last year. But help us kind of bridge this to what the hyperscalers are doing. Right, what are they doing with all the acceleration and all this hardware that they're putting in? Is it just making bigger and bigger models like large language model outputs, and how they're able to monetize them?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, we're still, we're still very much in the beginning of that, of, that AI growth cycle. It really, it's at least accelerated AI, is, it's approaching 10 years now from the first AlexNet moment. But what we're seeing as the, as the different types of those are evolving and figuring out where they- what their contributions, what their best- One, of course, is infrastructure. Providing infrastructure at scale for the world's AI startups and community in the cloud to go consume. And you see all the major startups partnering or, or getting access to the technology, often not just one, but, but multiple. Who can help them scale and grow their, capabilities, or who's bringing the GPU market first, or where can they get their, their GPUs? That's infrastructure.

Of course, infrastructure. Every dollar a cloud provider spends on buying a GPU, they gonna make it back at $5 over 4 years. The second thing we're seeing is just building and providing AI inference, whether it be a Llama or a Mistral or a Gemma, and providing it to the community users. Economics are even better. It's every dollar spent, this is $7 earned over that same time period and growing. The third, of course, is building the next generation models. Not everyone do it at that scale, and those models are getting very large and the infrastructure getting huge. So we're seeing them build amazing next generation capabilities and scale.

And of course, that's not just a physical, but actually figuring out all the software, the algorithms, and training at that scale over that many million, billions and trillions of tokens, and the software that has to go into that. I can talk all day about the software for training it to now 100,000 GPUs, and for, you know, the next quick out, people are even talking about a million.

Speaker 2

Amazing.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

So that is, all three of those are happening at the same time as they are figuring out their developing those models to their customers, renting infrastructure. I guess the fourth would be using and deploying AI for themselves. Copilot being an example. You can see multiple services on Amazon being backed by AI agents or AI capabilities, some directly, some indirectly, you may not know. And of course, companies like Meta offering services or theirs or elsewhere deploying AI into which raises all their numbers across the board. They've been a great partner for NVIDIA, and they have been fantastic.

Speaker 2

You know, AI, the traditional AI or CNNs, right, they have been around for a long time. They are we used to talk about like, tens of millions of parameters, and here we are knocking on the door of, what? Almost 2 trillion-... but, so you know, when we kind of say, okay, this is it, you know, the model sizes, now we might even go backwards, that we might try to optimize, the size of these models, right? Have smaller or mid-size.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah. So the evolution of AI models is quite interesting, and they map to the workloads. Obviously, initially, it started with ImageNet or image recognition. What is this a picture of? It doesn't even tell you where it is, what a picture of it is, and then we can identify every pixel, and it got more and more intelligent. When we got to language and LLMs, that was another quick up in intelligence because language is different than just image as a picture. You know, you and I do that, but also dogs and cats, and even bugs have to recognize through vision, what they are. Language is uniquely a step above, the person saying what they mean, the context, which goes right to the understanding of, you know, overall human understanding and knowledge.

Take it a click further. You got to do generative AI. Not only do you need to understand it, but actually synthesize to create new things, whether it be a chatbot, open chatbot conversation like you can do in WhatsApp or with Meta AI, or code that works correctly that wants to be a certain style, or being able to generate a picture from text and do multimodal. So, you know, it's a little... What do we mean? What are we saying? What is the context? And can we get? Can the AI reproduce that and generate that? I talked to the AI scientists. They're not overtrained yet. They can continue to take more and more tokens. The tokens, of course, is part of the limiter.

You do have to have a massive data set in order to train a foundational model. Once you do that, though, and you build 100 billion, 400 billion, 1.8 trillion, 2 trillion, that model becomes the foundation for a whole litany of other models.

Speaker 2

Nice.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

You can set an AP underneath it, depending on how much level of accuracy or comprehension you want to provide it, or your context length. You can then take that foundation model in and to generate Code Llama. So you can, like, basically have a coding Copilot. That all starts from a foundation model. Each one of these are not, you know, individual efforts. They take a foundation, like Microsoft would do with GPT and Copilot, turning one giant foundation model into 100 different assets that activate a whole bunch of other products. That's the value of foundational. They build a large, capable one, and they fine-tune, build smaller ones that can do certain tasks, or others, and create the...

In terms of where is it going next, they haven't seen the limit in terms of learning. The things they are still learning. Probably logical, as a human brain is 100 trillion to 100 trillion who you are worth of neurons.

Speaker 2

Nice.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

- connections in your head. You know, we're at about 2 trillion now in AI, so-

Speaker 2

So 50 times more?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

At least. We haven't gone to reasoning yet. That would be the next. How do you reason about, you know, actually doing reasoning or come up with conclusions in a, in a logic chain? That's thinking. That's not-

Speaker 2

But is there diminishing returns, you think? At some point, can that get to a level where it kind of puts an upper limit on how large these models can be?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Is this okay? All right. The cost of training is actually, is definitely a factor. The cost of getting the infrastructure is a factor in how fast we can move this, move the needle here. In addition to the science, in addition to the, to the complexity, to the resiliency, doing things at this scale requires an end-to-end optimization. It's not just about the hardware.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

You try to turn your company into more revenue, you just don't waltz in, you know, 10,000 or 50,000 employees. You grow and be more intelligent. So you just can't just waltz in 60,000 or 100,000 or 1,000,000 more GPUs. You do have to work out. You have to build the capability to be able to keep those GPUs all working together, that company, to build even bigger. That is the day-to-day life that I tend to lead, working with those biggest customers, to figure out not what scale they think they can build software and algorithms. Is there a limit? We haven't hit one yet.

Certainly, the 100,000 is happening now, and the 1 million is being talked about, and we're up that curve right now.

Speaker 2

Got it. Do you find it interesting that, you know, some of the most frequently used and the largest models, you know, one is developed by a startup and one is developed by somebody who's not? So where do you think kind of the biggest hyperscalers are in their journey? Are they still in early stages? You know, are they hoping to just kind of leverage the technology that's been built up? Or do you think they have to get things going all over the next several years?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, I think the lighthouse models, everyone recognizes the benefit of having a foundational model as an, as an asset, as something they can leverage for their business.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Some of them make it public, some of them don't, their decision. But the innovation is still happening.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

That's the interesting thing. So like the, you know, there's so much change happening in mind and how to train these things at scale. You know, students at Berkeley are becoming, or professors at Stanford, turn into startups, turn into attention mechanism, some modification of transformer, or they do something totally different than transformers and doing state-based algorithms. We are not the AI architecture, the model architecture. Just this year, well, about last year, we started seeing an explosion of the mixture of experts style model, which it changed the, what to scale to trillion. Mixture of experts is when, you know, previously, like GPT-3, it would have a transformer-based AI neural network, followed by another one, followed by another one, and many layers of one to 173 billion parameters.

If you look at models like GPT-4 or others, they're a mixture of experts. They're on the order of a trillion parameters. One model stacked on top of each other. One neural network stacked on top of each other, they actually have multiple neural networks running across each layer. In fact, if you look at the 1.8 trillion parameter GPT model, all trying to answer their part of the layer, and then they confer and meet up and decide what the right answer is, and then they share with the next 16. It's like this room's going next group, you guys confer and hand it off to the next row. And that mixture of experts allows each neural network to have its own little special, specialty, own little perspective, to make the whole thing smarter. Models get bigger, they're smarter.

It actually changed the way we do computing, because now we used to have one neural network, one big, you know, matrix, multiply math, followed by another, followed... And they're communicating all the time. Each one of you has to talk to everybody else and, and confer, and then share your knowledge with the next row. So you see that in the systems and designs and how the architects in the Blackwell architecture, we did this multi-node NVLink or NVL72. We expanded our how many GPUs you can connect in one two, to allow for that mixture of experts. So these everyone can be communicating with each other and not get blocked on, on IO. So this, this evolution is constantly happening, the model architectures. So you can see strategy of it.

They partner with a hyperscaler or partner with a cloud, with help from NVIDIA, get, you know, move the needle in a next phase of what AI looks like, and how it could implement in the architecture. So when I say early stages, that is kind of what it feels like. This last two years has been explosion and mixture of experts. It's influencing how we, you know, deploy them, the software we write, the algorithms, all that. And then on top of it, that's going right into NVIDIA's roadmap, what we're building, how fast we can have a roadmap. That's because the world is constantly in AI, evolving and changing and upgrading.

Speaker 2

Got it. I'm glad you brought that up, in terms of the one-year product. This is... We are seeing these model sizes, you know, I've seen one statistic that says they are doubling every six months or so. So that argues that even a one-year product cadence is actually not fast enough. But then the other practical side of it is that is in this constant, right, flux in, in their, data center. So how do you look at the puts and takes of this one-year, product cadence?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

So performance improvement comes as a compounding of hardware, connectivity, algorithms, and model architecture. When we do optimizations, we look at it holistically. Still are improving the performance of our Ampere generation GPUs. We've improved the performance of our Hopper GPUs by three times. Actually, when we first introduced Hopper and, you know, running Llama on doing actually pre-Llama, it was GPT inference. From the end of 2022 to today, I think we've improved Hopper's inference performance by three times, making the infrastructure more efficient, faster, and more usable. And that gives the customers who have to now buy at a faster clip confidence that it's gonna continue to return on value. And it does so. The workloads might change.

They may, you know, they'll take their initial Hopper and build the next GPT, the next GPT, but that may be the infrastructure they use to continue to refine or create the derivative models or host and serve it. I think one of the interesting things is our products used to be much more segmented. You use the 100-class GPU, the big iron for training, and the smaller PCIe products for inference, due to the cost or size of model. Today, the models, the scale, is also frequently used for inference, which I know is difficult for this community to digest and figure out what's inference and training. I'm sorry about that, but that is the other benefit to their—they can use those GPUs for both inference and training and get continued value and performance throughout.

So with the increased pace, it's sort of natural, this market, the continued improvement, the feedback cycle, working with NVIDIA allows us to invest and build new technologies and respond and enable, and then it becomes execution and supply and data center to make sure that we have—everyone has the GPUs they need. I certainly talk to startups. Some of them are on A100 still, and they're enjoying it. They're looking forward to their H100s for the Blackwells, and they're all getting the benefit of, you know, the performance and algorithms platform that we provide. So the demand—it's one way to support and drive the whole ecosystem.

That just creates more players, more invention, and moves the ball forward, which is a rising tide for us and-

Speaker 2

Right. You know, there's always this question about what are the killer apps driving generative AI, right? Yes, we understand that a lot of hardware is being deployed. So what are the top use cases customers deploying NVIDIA hardware are seeing that, you know, 4-5 times the return on their investment, but what are the use cases that are actually being... Obviously, that's over a 4-year period.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah.

Speaker 2

Right? So what are the big use, the most promising, right now?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

... You know, they found the baseline cap—so they build that foundation model, it gets shown off, interact with. But then they go and they turn those into, incorporate them into their products. Copilot is a good example, taking a GPT model and tailoring it so that you're like, write that email, type and modify or create that Excel ex—Excel expression. That's really hard to figure out. I've certainly used that. Certainly, Microsoft, I think, has spoken publicly about how much productivity their software developers have increased because they've made that model available internally to help be a copilot for their own software products, their entire product portfolio, like everything.

And, you know, I don't know how you model or measure that benefit, but their developers, but also the rate at which they can roll out new technology, new products. In some ways, generative AI is making all the old, boring products exciting again, and their value, their ASP, and the revenue they can make on their existing install base. That's just Microsoft. It's happening across all of those industries, and why every company wants to be deploying and using benefit of, is because they see the opportunity to improve the productivity of their own existing products and install base and users. And of course, provide the additional value that generative AI content as a feature add, not just make the existing features better. That's what we're seeing in the enterprise.

Certainly, other areas is in generative AI, just the content creation, the new, new companies, the new startups that are key technologies, key enablers, which are gonna be either consumed or purchased by the, by more of the established, software ecosystem. We are certainly seeing AI now work its way into care, into telco, markets. Big adopters, obviously, high—these are companies that see a high benefit to the... have a lot of data and are of often technology. So, yeah, industry will. The other area I think we're seeing is, well, AI in general is recommender systems.

It's not as talked about or as sexy, but certainly AI to understand the content, present the right content to the right user, or to make sure the wrong content is not shown to the wrong user, and also see the opportunities to make the click-throughs higher and the revenues, as a result, faster. Those recommender systems are leveraging all the learning, the generative AI working on the content to increase revenues.

Speaker 2

Got it. I wanted to talk about AI inference and get your views on what is NVIDIA's moat in... You know, if I say that inference is a workload where I'm really constraining the parameters, where I'm, you know, optimizing, you know, sometimes more for cost than performance, is like the best product for AI inference, right? I know exactly what I need to infer, right? And I can, you know, customize it, and I don't need to go after-- I don't need to make the same chip work for training also. Best product for AI inference.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, it's a good question, one that gets asked a lot. First off, your often your best architecture for inference is the one you trained on. If you know how training works, it blank neural network, or maybe one that's been pre-trained, a general foundation model, but you're gonna train it to be a better call center agent or code developer for a software program. So you're starting with that, you're training. You actually send the tokens through, and you ask the AI to predict what it should do, and you tell if it was right or wrong, and then you train. You send the errors to the different... Why it got the question wrong? Neurons. But it always starts with that forward pass, and it's a big part of training.

So that builds a natural transition from train to inference. The second thing is evolving and changing over time. Think about it. You're gonna invest $1 billion, $5 billion, $10 billion in a data center infrastructure for inference. Asset is gonna last you four or five. I think they're still just now retiring some of those older Kepler and Volta in the data center. The more that asset, the ones that are being important today, but the ones that are gonna be important, that show up tomorrow and after that, and after that, you know that you can make that investment and have that capability to produce the revenues that we talked about. It's a world, you know. And hardware takes a long time to build.

Don't forget that. You know, we're accelerating on our roadmap, but that's only because we can have multiple working in flight in parallel, as well as trying to compress it. But it's hard to compress, and the execution there is very difficult. You know, tape-outs to production-

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

are very long innovation cycle of AI. So, that's why programmability is important. That's why having an architecture that is a platform, that everybody is using, not just at your company, but that ecosystem, the startup ecosystem, you know that it will, as the model evolves, as the techniques and technologies evolve, that that investment is going to, continue innovation, not just what you have now. Now, of course, if you know you have one model, you know you're going to put it in one device, you know where it's gonna go, you know, maybe that, that may be the right answer. In every single cycle of AI, if your doorbell needs a, an AI in it, you know exactly what to build, please. But the output clear that it has to.

We have to have that level of investment. They want to make sure that they're getting the, the last the full years and get the full value out of it. And they see the benefit of... I think it's kind of working with every other AI company. They see the benefit of their, that investment getting the software and the algorithms and the, and the new models over time.

Speaker 2

Practically, Ian, do large customers have separate clusters for training, separate for inference, or are they mixing and matching? Are they reusing them, you know, some time for training, some time for... Practically, how do they do it?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

There are some geographic benefits and differences between training and inference. Most folks can do training anywhere in the globe, so we see big training clusters of where they can get the data center space and can tap into the grid. Having good access to power, and the economics there is very important. But training doesn't need to be localized. It's remote desktop, it's halfway around the world.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

You can feel the lag and the latency, training's fine for that. But inference, you kinda do need to be near-

Speaker 2

Right

Ian Buck

VP of HPC and Hyperscale, NVIDIA

the user. Inference workloads might be fine. Batch processing inference, fine. Doing a longer chatbot might be okay, but if you're doing GenAI search, you know, you're asking your browser, you want that answer quickly. If it's too slow, then it just immediately your quality of service plummets. So we often see that, training clusters, power and capability of inference tends to be either in those same clusters and they'll do—they'll divide it up. Just like the clouds are providing regions, they'll—folks will, and they can then serve it with both training and inference. I would just say the training part's a little bit more specialized because super big clusters can be wherever that makes the most sense for them, to build and invest in the building. Mm-hmm. Yeah.

But they are largely using more and more the same training and for training and inference, the same infrastructure. That, again, goes to the value they can be using it for training and flip it over to inference. If you saw what we when we launched our Blackwell, GB200 NVL72, we talked a lot about inference because those, they got to run that mixture of experts work through, and they, they also are the same infrastructure that can be used for training as well. And that's very... At the same time, we also make sure that they can take that, the same building blocks and vary the sizes and capabilities. GB200, the NVL72 is designed for trillion parameter.

The more modest size, 70B or 7B, and we have an NVL 2, which is just 2 Grace Blackwells tied together, which fit nicely in a standard server design and can be deployed anywhere, including- Telco often will have a, you know, they'll have a cage. A cage has 100 kilowatts, can't exceed that.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

So the metric is, you know, can I put what kind of GPUs or what kind of servers can I put in there that maybe models like can at the edge? You'll do something different than you'll do for a big OpenAI data center or some such. And that's why we have both kind of-

Speaker 2

Since you have been so intimately involved with CUDA since its founding, how do you address the pushback that people have is that, you know, software as abstraction is being done away from CUDA and it will make CUDA obsolete at some point? That's not really a sustainable moat for NVIDIA. How do you address that pushback?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, I think the moat is a complicated word, and what does it mean? The innovation... what makes the platform useful is how many developers it has on it, how many user base, people can get the access so that that next AI invention can be, you know, make sure it's compatible with that architecture and what it can do. These foundations, these new next generation, they're often, you know, they're not academic exercises. They are designed to what the limits of the capability are trained. And many of the models that we are enjoying today were actually trained, like, or started training, like two years ago. There's a lag, unfortunately, in terms of how long it takes to when the data center gets stood up to when this...

You know, they're obviously thinking, we're explaining what we're building to try to shorten this process, but it is directly influencing the scale of what they can build. Not just the number. With every generation, we also improve the performance on a per GPU basis by x factors. Blackwell is like 4-5 times better at training per GPU than Hopper was, but 30 times better on inference for trillion parameter models. And so that sets the bar for them, how big of a model, and then they look at the architecture of the NVLink, what they can build. So it is a symbiosis between what we're building, what they're inventing.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

and then keep riding that wave. That really helps us define the next generation AI model.

Speaker 2

How is the outlook around Blackwell as we look at next year? You know, first of all, do you think that because of the different power requirements that are going up seeing the growth of Blackwell in any way? And what's sort of the lead time in engagement between when somebody wants to deploy, right, versus when they have to start the discussion with NVIDIA, i.e., how next year?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Good question, actually. So, there's one question about how far forward we work with all of our, not just our biggest customers, but all those AI visionaries that are... And then, you know, what does that ramp look like when, and Blackwell specifically? So, we've stated recently in our earnings that Blackwell has now entered into production builds. We started our production... and we're ramping for production outs later this year. And then what—everything—That always looks like a hockey stick. You know, you start small, and you go pretty quick to the right. And the new technology transition comes in. The value is so high, there's always a mix of a challenge of supply and demand.

We experienced that certainly with Hopper, and there'll be similar kinds of supply and demand constraints with Blackwell, certainly at the end of this year and going into next year. In terms of the horizon, though, we've that conversation on Blackwell transition is, and what to build, starts two years in advance. The slide that was announced in Computex of, you know, what is our Hopper platform, Blackwell? And for the first time, you guys saw for quite some time with those biggest customers. So they know kind of where we're going and the timescales. It's really important for us to do that. Data centers don't drop out of the sky.

They have to be. They're, they need to understand, you know, what is a Blackwell data center going to look like, and how is it going to differ from Hopper? And it will. The opportunity we saw with Blackwell was to transition to put 72 GPUs in a single rack, which has not been, you know, taken to scale before. We have experience with it. I also do the HPC and Supermicro, those were one-off systems. Now we're taking it and democratizing and commoditizing that supercomputing technology, taking it everywhere.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Very challenging. And of course, we've been talking with them about it for two years now, since the supply chain. In Taiwan, for example, the people that are building the liquid cooling infrastructure, the power shelves, the whips, which are the cables that go down into the bus bars. Opportunity here is to help them get the maximum performance through a fixed megawatt data center, by and at the best possible cost, and optimize for a cost. By doing, we need to move to liquid cooling. We want to make sure it's a higher density, higher power rack. But the benefit is that we can do it all 72 in one NVLink them with copper instead of having to go to optics, which adds cost and adds power.

Every time you add cost and power, you're just taking away from the number of GPUs you can put in your 10, 50, 100 million. So that, that is driving us toward reducing cost, increasing density. So when you look at a Blackwell, you may say, "Well, it's really hot." That's actually gonna be significantly reducing of a fixed power data center. So the, there's a strong economic and technology driver to transition to more denser and more power efficient and more and next-generation cooler. Water is a fantastic-

Speaker 2

Right

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Mover of heat. Your, your house is built with insulation that is nothing more than just trapping air. Air is actually an insulator of heat, but water is excellent at it. If you've ever jumped in, jumped from a 70-degree pool, from a 70-degree air, it feels really cold. That's because water is sucking the heat right out of you.

Speaker 2

Right.

Ian Buck

VP of HPC and Hyperscale, NVIDIA

It's really good at moving heat around, and more GPUs, more capability, and denser, more capable AI systems.

Speaker 2

Got it. So customers who are deploying Blackwell, are they replacing the Hoppers, or are they setting up new infrastructure? Like, how should we think about kind of the replacement cycle of these products?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, they can't build their data centers fast enough, so what they're doing is they're decommissioning or accelerating their CPU. Still have quite a few. We're not in every data center. Obviously, the vast majority of systems in at hyperscale are CPU systems. So if you want to make space and you can only build so fast-

Speaker 2

Taking out traditional-

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Old, legacy systems that maybe they've just left, not upgraded, they can accelerate the decommission of the older CPU infrastructure. They can also accelerate it. So actually, or so, this old workflow that was on CPU, they just kinda didn't have any people on, but it worked, and so it's sustaining. They're going back in and like, "Okay, accelerate this old database workload or this machine learning workload that we've left alone for so many years," because we can take 1,000 servers and do what 1,000 servers were doing with just 10 GPU servers, you know, hundreds of racks and megawatts of power. So there is a... It's not just the new data centers that are being built. What they're doing is actually making space for new, more Hoppers.

Every Hopper, and they can fill every Ampere, and they can even sell some of the, the earlier generation, Volta, systems, in some cases or, you know, keep them or build new and re-retiring or deprecating or accelerating their CPU infrastructure.

Speaker 2

Got it. And lastly, InfiniBand versus Ethernet, right? Most of the clusters that NVIDIA has built so far have primarily used InfiniBand. What is the strategy behind the new Spectrum-X product? Because there is a large incumbent that is out there, you know, just like NVIDIA, an incumbent on the switching side. So what will make customers adopt your product versus staying with the incumbent?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah. So, first, we support all different kinds. Amazon has their EFA networking, which we support and execute toward. Each of the hyperscalers has different flavors of their own Ethernet or networking, or some have taken the decision to get the best. We see that with Microsoft, and they're matching our performance 1-to-1 in benchmarks like MLPerf, with, you know, connecting 10,000 GPUs with InfiniBand, and they have a 10,000-GPU cluster, gets the same score. They know they're getting the best on MLPerf. Ethernet's tricky in the sense that, you know, the standard Ethernet infrastructure, Ethernet is really important, you know, networking technology. It has a huge ecosystem, software capabilities for managing at scale. Ethernet was originally designed for sort of that north-south use case.

How, you know, you have a server, it wants to talk to the rest of the world. You have a CPU core, wants to talk to the rest of the world. That's what Ethernet did, but it was for the traditional use cases. When you get to AI, it's a different kind of problem. It's kind of a supercomputing problem. You have these billions of dollars all trying to train a model like Llama 3. The Llama, and now we're going to 100,000, all trying to train an even bigger model. If one of these packets kind of slows down, one of these links gets lost or has a blip, the entire infrastructure slows down because it's waiting for the slowest guy.

And InfiniBand did that and made sure the performance was the max possible, so everyone could talk to everybody else. And not—that's the difference between designing for east-west versus north-south. You don't connect to the person next to you, everybody's happy. But if your connection slowed everybody down, that would be a problem. And if you look at it from a data center standpoint, that's $ billions of wasted GPU. The whole data center goes down. So that's what Spectrum-X is addressing, to provide a standard Ethernet ecos...

support of the standard Ethernet ecosystem, which many hyperscalers are not, but add the technologies that support the East-West traffic, the adaptive routing, the congestion control technique, all the stuff that you need to do to make sure that you have that deterministic performance, so that, that AI can progress, your GPU stay utilized, and it's a really hard problem. We've been accelerating our Spectrum-X roadmap as a result. We still have InfiniBand, which is obviously very important in performance, but to provide that, a kind of Ethernet that can go and train giant models, it requires that technology to be embedded, integrated Ethernet ecosystem. So that's what Spectrum-X is.

Speaker 2

Do you see the attach rate of your Ethernet switch going up? Because I think NVIDIA has outlined, like, $several billion of which includes the NICs as well, right? For-

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, we've-

Speaker 2

It's even-

Ian Buck

VP of HPC and Hyperscale, NVIDIA

There's a 100,000 GPU training project that's being put together right now, which will be Spectrum-X.

Speaker 2

And then as Blackwell rolls out next year, do you see your attach rate of Ethernet?

Ian Buck

VP of HPC and Hyperscale, NVIDIA

Yeah, you'll see a mix of both Ethernet and InfiniBand.

Speaker 2

Terrific. With that, thank you so much, Ian. Really appreciate your insights. Thanks, everyone.