GTC Financial Analyst Q&A 2024

Mar 19, 2024

Slides

Jensen Huang

Founder and CEO, NVIDIA

Good morning.

Colette Kress

CFO, NVIDIA

Morning. We've got everything here.

Jensen Huang

Founder and CEO, NVIDIA

Wow.

Colette Kress

CFO, NVIDIA

We can stay all day.

Jensen Huang

Founder and CEO, NVIDIA

Nice to see all of you. All right, what's the game plan?

Colette Kress

CFO, NVIDIA

Okay, well, we've got a full house, and we're thanking you all for coming out for our first in-person in such a long time. Jensen and I are here to kind of really go through any questions that you have, questions from yesterday, and we're going to go through a series of folks that are going to be in the aisles that you can just reach out to us, raise your hand. We'll get to you with a mic, and Jensen are here to answer any questions from yesterday. We thought that would be a better plan for you. I know you have already asked quite a few questions both last night and this morning, but rather than giving you a formal presentation, we're just going to go through a good Q&A today. Sound like a good plan?

I'm going to turn it to Jensen to see if he wants to add some opening remarks because we have just a quick introduction, and we'll do it that way, okay?

Jensen Huang

Founder and CEO, NVIDIA

Yeah, thank you. Thank you. First, great to see all of you. There were so many things I wanted to say yesterday and probably have said and wanted to say better, but I got to tell you, I've never presented at a rock concert before. I don't know about you guys, but I've never presented at a rock concert before. I had simulated what it was going to be like, but when I walked on stage, it still took my breath away. And so anyways, I did the best I could. After the tour, I'm going to do a better job, I'm sure. I just need a lot more practice. But there were a few things I wanted to tell you. Is there a clicker? Oh, look at that. See? This is like spatial computing.

By the way, if you get a, I don't know, you'll get a chance because it takes a little setup, but if you get a chance to see Omniverse in Vision Pro, it is insane. Completely incomprehensible how realistic it is. All right. So we spoke about five things yesterday, and I think the first one really deserves some explanation. I think the first one is, of course, this new industrial revolution. There are two things that are happening, two transitions that are happening. The first is moving from general-purpose computing to accelerated computing. If you just looked at the extraordinary trend of general-purpose computing, it has slowed down tremendously over the years.

In fact, we've known that it's been slowing down for about a decade, and people just didn't want to deal with it for a decade, but you really have to deal with it now, and you can see that people are extending the depreciation cycle of their data centers as a result. You could buy a whole new set of general-purpose servers, and it's not going to improve your throughput of your overall data center dramatically. So you might as well just continue to use what you have for a little longer. That trend is never going to reverse. General-purpose computing has reached this end. We're going to continue to need it, and there's a whole lot of software that runs on it, but it is very clear we should accelerate everything we can.

There are many different industries that have already been accelerated, some that are very large workloads that we really would like to accelerate more, but the benefits of accelerated computing are very, very clear. One of the areas that I didn't spend time on yesterday that I really wanted to was data processing. NVIDIA has a suite of libraries that before you could do almost anything in a company, you have to process the data. You have to, of course, ingest the data, and the amount of data is extraordinary, zettabytes of data being created around the world. It's doubling every couple of years, even though computing is not doubling every couple of years. So you know that data processing, you're on the wrong side of that curve already on data processing.

If you don't move to accelerated computing, your data processing bills just keep on going up and up and up and up. So for a lot of companies that recognize this, AstraZeneca, Visa, AMEX, Mastercard, so many, so many companies that we work with, they've reduced their data processing expense by 95%, basically 20x reduction, to the point the acceleration is so extraordinary now with our suite of libraries called RAPIDS that the inventor of Spark, who started a great company called Databricks, and they are the cloud, large-scale data processing company, they announced that they're going to take Databricks, their Photon engine, which is their crown jewel, and they're going to accelerate that with NVIDIA GPUs. Okay, so the benefit of acceleration, of course, passes along savings to your customers, but very importantly, so that you can continue to sustainably compute.

Otherwise, you're on the wrong side of that curve. You'll never get on the right side of the curve. You have to accelerate. The question is today or tomorrow. Okay, so accelerated computing. We accelerated algorithms so quickly that the marginal cost of computing has declined so tremendously over the last decade that it enabled this new wave of doing software called generative AI. Generative AI, as you know, requires a lot of flops, a lot of flops, a lot of computation. It is not a normal amount of computation, an insane amount of computation, and yet it can now be done cost-effectively that consumers can use this incredible service called ChatGPT. And so it's something to consider that accelerated computing has dropped, has driven down the marginal cost of computing so far that it enabled a new way of doing something else.

This new way is software written by computers with a raw material called data. You apply energy to it. There's an instrument called GPU supercomputers, and what comes out of it are tokens that we enjoy. When you're interacting with ChatGPT, it's producing tokens. Now, that data center is not a normal data center. It's not a data center that you know of in the past. The reason for that is this. It's not shared by a whole lot of people. It's not doing a whole lot of different things. It's running one application 24/7, and its job is not just to save money. Its job is to make money. It's a factory. This is no different than an AC generator of the last industrial revolution. And it's no different than the raw material coming in is, of course, water.

They applied energy to it and turned it into electricity. Now, it's data that comes into it. It's refined using data processing and then, of course, generative AI models, and what comes out of it is valuable tokens. This idea that we would apply this basic method of software, token generation, what some people call inference, but token generation, this method of producing software, producing data, interacting with you, ChatGPT is interacting with you, this method of working with you, collaborating with you, you extend this as far as you like, co-pilots to artificial intelligence agents, you extend the idea as long as you like, but it's basically the same idea. It's generating software. It's generating tokens. And it's coming out of this thing called an AI generator that we call GPU supercomputers. Does that make sense?

And so the two ideas, one is the traditional data centers that we use today should be accelerated, and they are. They're being modernized, lots and lots of it, and more and more industries one after another. And so what is $1 trillion of data centers in the world will surely all be accelerated someday. The question is how many years it would take to do, but because of the second dynamic, which is its incredible benefit in artificial intelligence, it's going to further accelerate that trend. Does that make sense? However, the second data center, the second type of data center called AC generators or, excuse me, AI generators or AI factories, as I've described it as, this is a brand new thing.

It's a brand new type of software generating a brand new type of valuable resource, and it's going to be created by companies, by industries, by countries, so on and so forth, a new industry. I also spoke about our new platform. There are a lot of speculations about Blackwell. Blackwell is both a chip at the heart of the system, but it's really a platform. It's basically a computer system. What NVIDIA does for a living is not build the chip. We build an entire supercomputer from the chip to the system to the interconnects, the NVLinks, the networking, but very importantly, the software. Could you imagine the mountain of electronics that are brought into your house? How are you going to program it?

Without all of the libraries that have been created over the years in order to make it effective, you've got a couple of billion dollars' worth of asset you just brought into your company, and anytime it's not utilized, it's costing you money. The expense is too incredible. Our ability to help companies not just buy the chips, but to bring up the systems and put it to use, and then working with them all the time to put it to better and better and better use, that is really important. Okay, that's what NVIDIA does for a living. The platform we call Blackwell has all of these components associated with it that I showed you at the end of the presentation to give you a sense of the magnitude of what we've built. All of that, we then disassemble.

This is the part that's incredibly hard about what we do. We build this vertically integrated thing, but we build it in a way that can be disassembled later and for you to buy it in parts because maybe you want to connect it to x86. Maybe you want to connect it to a PCI Express fabric. Maybe you want to connect it across a whole bunch of fiber, okay, optics. Maybe you want to have very large NVLink domains. Maybe you want smaller NVLink domains. Maybe you can use an ARM, maybe so on and so forth. Does that make sense? Maybe you would like to use Ethernet. Okay, Ethernet is not great for AI. It doesn't matter what anybody says. You can't change the facts. And there's a reason for that. There's a reason why Ethernet is not great for AI.

But you can make Ethernet great for AI. In the case of the Ethernet industry, it's called Ultra Ethernet. So in about three or four years, Ultra Ethernet is going to come. It'll be better for AI. But until then, it's not good for AI. It's a good network, but it's not good for AI. And so we've extended Ethernet. We've added something to it. We call it Spectrum-X that basically does adaptive routing. It does congestion control. It does noise isolation. Remember when you have chatty neighbors? It takes away from the network traffic. And AI is not about the average throughput. AI is not about the average throughput of the network, which is what Ethernet is designed for, maximum average throughput. AI only cares about when did the last student turn in their partial product? It's the last person, a fundamentally different design point.

If you're optimizing for highest average versus the worst student, you will come up with a different architecture. Does that make sense? Okay. And because AI has all reduced, all to all, all gather, just look it up in the algorithm, the transformer algorithm, the mixture of experts algorithm, you'll see all of it. All these GPUs all have to communicate with each other, and the last GPU to submit the answer holds everybody back. That's how it works. And so that's the reason why the networking is such a large impact. Can you network everything together? Yes. But will you lose 10%-20% of utilization? Yes. And what's 10% or 20% utilization if the computer is $10,000? Not much. But what's 10%-20% utilization if the computer is $2 billion?

It paid for the whole network, which is the reason why supercomputers are built the way they are. Okay. And so anyways, I showed examples of all these different components, and our company creates a platform and all the software associated with it, all the necessary electronics, and then we work with companies and customers to integrate that into their data center because maybe their security is different. Maybe their thermal management is different. Maybe their management plane is different. Maybe they want to use it just for one dedicated AI. Maybe they want to rent it out for a lot of people to do different AI with. The use cases are so broad, and maybe they want to build it on-prem, and they want to run VMware on it, and maybe somebody just wants to run Kubernetes. Somebody wants to run Slurm.

I could list off all of the different varieties of environments, and it is completely mind-blowing. We took all of those considerations, and over the course of quite a long time, we've now figured out how to serve literally everybody. As a result, we could build supercomputers at scale, but basically what NVIDIA does is build data centers. Okay. We break it up into small parts, and we sell it as components. People think, as a result, we're a chip company. The third thing that we did was we talked about this new type of software called NIMs. These large language models are miracles. ChatGPT is a miracle. It's a miracle, not just in what it's able to do, but the team that put it so that you can interact with ChatGPT in very high response rate, that is a world-class computer science organization.

That is not a normal computer science organization. The OpenAI team that's working on this stuff is a world-class team, some of the best in the world. Well, in order for every company to be able to build their own AI, operate their own AI, deploy their own AI, run it across multiple clouds, somebody is going to have to go do that computer science for them. And so instead of doing this for every single model, for every single company, every single configuration, we decided to create the tools and the tooling and the operations, and we're going to package up large language models for the very first time. And you could buy it. You could just come to our website, download it, and you could run it.

And the way we charge you is all of those models are free, but when you run it, when you deploy it in an enterprise, the cost of running it is $4,500 per GPU per year, basically the operating system of running that language model. Okay. And so per instance, the per-use cost is extremely low. It's very, very affordable. But the benefit is really great. Okay. We call that NIMs, NVIDIA Inference Microservices. You take these NIMs, you take these NIMs, and you're going to have NIMs of all kinds. You're going to have NIMs of computer vision. You're going to have NIMs of speech and speech recognition and text-to-speech, and you're going to have facial animation. You're going to have robotic articulation. You're going to have all kinds of different types of NIMs.

These NIMs, the way that you would use it is you would download it from our website, and you would fine-tune it with your examples. You would give it examples. You say, "The way that you responded to that question isn't exactly right. It might be right in another company, but it's not right in ours." And so I'm going to give you some examples that are exactly the way we would like to have it. You show it your work products. This is what a good answer looks like. This is what the right answer looks like, a whole bunch of them.

We have a system that helps you curate that, process that, tokenize that, all of the AI processing that goes along with it, all the data processing that goes along with it, fine-tune that, evaluate that, guardrail that so that your AIs are very effective, number one, also very narrow. The reason why you want it to be very narrow is because if you're a retail company, you would prefer your AI just didn't pontificate about some random stuff. Okay. So whatever the questions are, it guardrails it back to that lane. That guardrailing system is another AI. We have all these different AIs that help you customize our NIMs, and you could create all kinds of different NIMs.

We gave you some frameworks for many of them, and one of the very important ones is understanding proprietary data because every company has proprietary data. And so we created a microservice called Retriever. It's state-of-the-art, and it helps you take your database, which is structured or unstructured images or graphs or charts or whatever it is, and we help you embed them. We help you extract the meaning out of that data. And then we take the. It's called semantics, and what the semantic is embedded in a vector, that vector is now indexed into a new database called Vector Database. Okay. And that Vector Database, then afterwards, you can just talk to it. You say, "Hey, how many mammals do I have?" For example, and it goes in there and says, "Hey, look at that. You got a cat. You have a dog.

You have a giraffe." And you say, "This is what you have in inventory in your warehouse." Okay. So on and so forth. All right. And so all of that is called NeMo, and we have experts to help you. And then we put a canonical NVIDIA infrastructure we call DGX Cloud in all of the world's clouds. And so we have DGX Cloud in AWS. We have DGX Cloud in Azure. We have DGX Cloud in GCP and OCI. And so we work with the world's enterprise companies, particularly the enterprise IT companies, and we create these great AIs with them. But when they're done, they can run in DGX Cloud, which means we're effectively bringing customers to the world's clouds. A platform like us, a platform company, brings system makers customers. And CSPs are system makers. They rent systems instead of sell systems, but they're system makers.

And so we bring customers to our CSPs, which is a very sensible thing to do, just as we brought customers to HP and Dell and IBM and Lenovo and so on and so forth and Supermicro and CoreWeave and so on and so forth. We bring customers to CSPs because a platform company does that. Does that make sense? If you're a platform company, you create opportunities for everybody in your ecosystem. And so the DGX Cloud allows us to land all of these enterprise applications in the world's CSPs. And if they want to do it on-prem, we have great partnerships with Dell that we announced yesterday, HP and others, that you can land those NIMs in their systems. And then I talked about the next wave of AI, which is really about industrial AI.

You know that the vast majority of the world's industries, the largest in dollars, are heavy industries, and heavy industries have never really benefited from IT. They've not benefited from a lot of the design and all of the digital. It's called not digitization, but digitalization, putting it to use. They've not benefited from digitalization, not like our industry. And because our industry is completely digitalized, our technology advance is insanely great. We don't call it chip discovery. We call it chip design. Why did they call it drug discovery? Like tomorrow could be different than yesterday because it is. And it's so complicated. It's so complicated biology. And the longitudinal impact is so great because, as you know, life evolves at a different rate than transistors.

And so therefore, cause and effect is harder to monitor because it happens over a large scale, large scale of systems and large scale of time. These are very complicated problems. Physics is very similar. Okay. Industrial physics is very similar. We finally have the ability, using large language models, the same technologies, if we can tokenize proteins, if we could tokenize words, tokenize speech, tokenize images, we can tokenize articulation. This is no different than speech. Right. We can tokenize proteins moving. That's no different than speech. Okay. We can tokenize all these different things. We can tokenize physics. Then we can understand its meaning, just like we've understood the meaning of words. If we can understand its meaning and we can connect it to other modalities, then we can do generative AI. I just explained very quickly that 12 years ago, I saw it.

Our company saw it with ImageNet. The big breakthrough was literally 12 years ago. We said, "Huh, interesting. But what are we actually looking at?" Interesting, but what were you looking at? ChatGPT, I would say everybody should say, "Interesting. But what are we looking at?" What are we looking at? We are looking at a computer software that can emulate you, emulate us. By reading our words, it's emulating the production of our words. If you can tokenize words and if you could tokenize articulation, for example, why can't it imitate us and generalize it in a way that ChatGPT has? So the ChatGPT moment for robotics has got to be around the corner. And so we want to enable people to be able to do that.

We created this operating system that enables these AIs to be able to practice in a physically based world, and we call it Omniverse. Omniverse is not a tool. Omniverse is not even an engine. Omniverse are technology APIs that supercharge other people's tools. I'm super excited about the announcement with Dassault . They're connecting to Omniverse API to supercharge 3DEXCITE. Microsoft has connected it to Power BI. Rockwell has connected it to their tools for industrial automation. Siemens has connected to their, so it's a bunch of APIs that is physically based, and it produces image or articulation, and it connects a whole bunch of different environments. These APIs are intended to supercharge third-party tools. I'm super delighted to see the adoption across it, particularly in industrial automation. Those are the five things that we did.

I'll do this next one very quickly. I'm sorry I took longer than I should, but let me do this next one really quickly. Look at that. All right. So this chart, don't over-stare at it, but it communicates several things on top of developers. NVIDIA is a market maker, not share taker. The reason for that is everything we do doesn't exist when we started doing it. There is no such—you just go up and down. In fact, even originally, 3D computer games didn't exist when we started working on it. And so we had to go create the algorithms necessary. Real-time ray tracing did not exist until we created it. And so all of these different capabilities did not exist until we created it. And once we created it, there are no applications for it.

So we had to go cultivate and work with developers to integrate this technology we have just created so that applications could be benefited by it. I just explained that for Omniverse. We invented Omniverse. We didn't take anything from anybody. It didn't exist. And in order for it to be useful, we now have to have developers: Dassault , ANSYS, Cadence, so on and so forth. Does that make sense? Rockwell, Siemens. We need the developers to take advantage of our APIs, our technologies. Sometimes they're in the form of an SDK. In the case of Omniverse, I'm super proud that it's in the form of cloud APIs because now it's so easy to use. You could use it in both ways, but APIs are much, much easier to use. Okay. And we host Omniverse in the Azure Cloud.

Notice, whenever we connect it to a customer, we create an opportunity for Azure. So Azure is on the foundation. They're a system provider. Back in the old days, system providers used to be OEMs, and they continue to be. But system providers on the bottom, developers on top. We invent technology in the middle. The technology that we invent happens to be chip last. It's software first. And the reason for that is without a developer, there will be no demand for chips. And so NVIDIA is an algorithm company first. And we create these SDKs. They call them DSLs, domain-specific libraries. SQL is a domain-specific library. You might have heard of Hadoop as a domain-specific library in storage computing. NVIDIA's cuDNN is potentially the most successful domain-specific library short of SQL, the world's ever seen. cuDNN is the domain-specific library.

It's a computation engine library for deep neural networks. Without DNN, none of them would have been able to use CUDA. So DNN was invented. Real-time ray tracing, optics, which led to RTX. Make sense? And we have hundreds of domain-specific libraries. Omniverse is a domain-specific library. And these domain-specific libraries are integrated with developers on the software side, which then, when the applications are created and there's demand for that application, creates opportunities for the foundation below. We are market makers, not share takers. Does that make sense? And so what's the takeaway? The takeaway is you can't create markets without software. It has always been the case. That has never changed. You could build chips to make software run better, but you can't create a new market without software.

What makes NVIDIA unique is that we're the only chip company, I believe, that can go create its own market. And notice all the markets we're creating. That's why we're always talking about the future. These are the things we're working on. Nothing would give me more joy to work with the entire industry to create the computer-aided drug design industry, not drug discovery industry, drug design industry. We had to do drug design the way we do drug chip design, not chip discovery. And so I expect every single chip next year to be better than the one before, not as if I'm looking for truffles, which is discovery. Some days are good. Some days are less good. Okay. All right. So we have developers on top. We have our foundation on the bottom. The developers want something very, very simple.

They want to make sure that your technology is performant, but they have to solve a problem that they couldn't solve any other way. But the most important thing for a developer is install base. And the reason for that is they don't sell hardware. Their software doesn't get used if nobody has the hardware to run it. Okay. So what developers want is install base. That has not changed since the beginning of time. It's not changed now. Artificial intelligence, if you develop artificial intelligence software and you want to deploy it so that people could use it, you need install base. Second, the systems companies, the foundation companies, they want killer apps. That's the reason why killer app word existed. Because where there's a killer app, there's customer demand. Where there's customer demand, you can sell hardware.

So it turns out this loop is insanely hard to kick start. How many accelerated computing platforms can you really, really build? Can you have an accelerated computing platform for generative AI, as well as industrial robotics, as well as quantum, as well as 6G, as well as weather prediction, as well? You can have all these different versions because some of it is good at fluids, some of it is good at particles, some of it is good at biology, some of it is good at robotics, some of it is good at AI, some of it's good at SQL. The answer is no. You need a sufficiently general-purpose accelerated computing platform, just as the last computing platform was insanely successful because it ran everything. Now, NVIDIA has taken us a long time, but we basically run everything.

If your software is accelerated, I am very certain it runs on NVIDIA. Does that make sense? Okay. If you have accelerated software, I am very, very certain it runs on NVIDIA. And the reason for that is because it probably ran on NVIDIA first. Okay. All right. So this is the NVIDIA architecture. I spoke about whenever I give keynotes, I tend to touch on all of them, different pieces of it, some new things that we did in the middle, in this case, Blackwell. I spoke about there were so many good stuff, and you really ought to go to our talks. There's like 1,000 talks. 6G research. How is 6G going to happen? Of course, AI. Of course, AI. And what are you going to use the AI for? Robotic MIMO. Why is MIMO so pre-installed, meaning that why does the algorithm come before the site?

We should have site-specific MIMO, just like robotic MIMO. Reinforcement learning deals with the environment. 6G, of course, is going to be software-defined. Of course, it's going to be AI. Quantum computing, of course—we should be a great partner for the quantum computing industry. How else are you going to drive a quantum computer to have the world's fastest computer sitting next to it? How are you going to simulate a quantum computer, emulate a quantum computer? What is the programming model for a quantum computer? You can't just program a quantum computer all by itself. You need to have classical computing sitting next to it. The quantum would be kind of a quantum accelerator. Who should go do that? Well, we've done that. We work with all the industry on that.

So across the board, some really, really great stuff. I wish I could have covered all. We could have a whole keynote just on all that stuff, but we covered the whole gamut. Okay. So that was kind of yesterday. Thank you for that.

Colette Kress

CFO, NVIDIA

Okay. We have them going around, and we'll see if we can grab your questions.

Jensen Huang

Founder and CEO, NVIDIA

There was the question that I'm sure first question goes, "If you could have done the keynote in 10 minutes, why didn't you just do it yesterday in 10 minutes?" Good question.

Ben Reitzes

Managing Director and Head of Technology Research, Melius Research

Yeah. Hi, Jensen. Ben Reitzes with Melius Research. Nice to see you. Thanks for being here. It's a big thrill, I think, for all of us. So I wanted to ask you a little bit more about your vision with software. You are creating industries. You have a full-stack approach. It's clear your software makes your chips run better. Do you feel that your software business over the long term could be as big as your chip business? How do you look at if we look in 10 years, and you're not a chip company, but what do you think you look like given what you're seeing with the momentum in software and how you're building these industries? It would seem like you're going to be a lot more.

Jensen Huang

Founder and CEO, NVIDIA

Yeah. Thank you, Ben. I appreciate that. First of all, I appreciate all of you coming. This is a very, very different type of an event, as you know. Most of the talks are software talks, and they're all computer scientists, and they're talking about algorithms. The NVIDIA software stack is about two things. It's either algorithms that help the computer run better, TensorRT LLM. It's an insanely complicated algorithm, and it explores the computing space in a way that most compilers never have to do. And TensorRT LLM can't even be built without a supercomputer. And it's very likely that TensorRT in the future, TensorRT LLM in the future, actually just have to run on a supercomputer all the time in order to optimize AIs for everybody's computer. And so that optimization problem is very, very complicated.

So that would be an example of software that we create, the optimization, the runtime. The second software we create is whenever there's an algorithm where the principle algorithm is well-known, for example, Navier-Stokes, however. Schrödinger's equations. However, maybe the expression of it in a supercomputing or accelerated computing or real-time way, ray tracing is a great example, real-time way, has never been discovered. Does it make sense? Okay. And so, as you know, Navier-Stokes is an insanely complicated algorithm. And to be able to refactor that in a way that can run in real time is insanely complicated as well and requires a lot of invention. And some of the inventions, some of our computer scientists in our company have Oscars, award-winning computer scientists, because they've solved these problems at such a large scale, whether you use it for movies.

Their inventions, their algorithms, their data structures are computer science in itself. Okay. And so we'll dedicate ourselves to these two layers. And then when you package it all back in the old days, that's useful for entertainment, media entertainment, science, so on and so forth. But today, because AI has brought this technology so close to application, simulating molecules used to be a thing that you do in universities. Now you can do that at work. And so as we now reformulate all of these algorithms for the consumption of enterprise, it becomes enterprise software, enterprise software like nobody's ever seen before. We're going to put them in NIMs, these packages. We'll have hundreds of them, and we'll manufacture these things and support them and maintain them and keep them performant and so on, support customers with it.

And so we'll produce NIMs at a very large scale, is my guess. And this is going to be what we call that, underneath the entire bucket of software: we call it NVIDIA AI Enterprise. A NIM is basically an AI in a microservice for enterprise. And so my expectation is that this is going to be a very large business. And this is the part of the industrial revolution. If you saw that there's the IT industry today, SAPs and great companies, ServiceNows and Adobes and Autodesk and Cadence, that layer, that's today's IT industry. That's not where we're going to play. We're going to play on the layer above. That layer above is a bunch of AIs, and these algorithms really, really were the right company to go build them. And so we'll build some with them.

We'll build some ourselves, but we'll package them up and deploy it at enterprise scale. Okay. And so I appreciate you asking the question. And while she's walking there, go ahead. Yeah.

Vivek Arya

Managing Director, Bank of America Securities

No. Hi, Vivek Arya from Bank of America Securities. Thank you, Jensen. Thank you, Colette, for the presentation. So, Jensen, my question is perhaps a little more near to medium term, which is just the size of the addressable market, because your revenues have gotten big so quickly. And when I look at how much they represent as a percentage of the spending of some of your large customers, they are like 30%, 40%, 50%, right, sometimes more. But when I look at how much money they are generating from generative AI, it's like less than 10% of their sales. So how long can this gap persist? Right. And then more importantly, are we kind of midway through how much of their spending can be spent on your product?

So just, I think in the past, you have given us kind of a $1 trillion market going to $2 trillion. If you could just educate us on how large the market is and where are we in that adoption curve based on how much it can be, based on how much it's being monetized in the near to medium term.

Jensen Huang

Founder and CEO, NVIDIA

Okay. I'm going to first give you the super condensed version, and I'll come back and work it out. Okay. The answer for how big the market is, how big we can be, has to do with the size of the market and what we sell. Remember, what we sell is a data center. I just broke it into parts. But in the end, I sold the data center. Notice the last image you saw at the keynote. It's a reminder of what we actually sell. We showed a bunch of chips, but remember, we don't really sell that. The chips don't work all by themselves. You can buy the chips, but they don't work. You need to build them into a system. And most importantly, the system software and the ecosystem stack is really complicated.

And so NVIDIA builds entire data centers for AI, and we just break it up into parts so that it fits into your company. So that's number one. What do we sell? And what's the opportunity? The opportunity for the world today, the data center size is $1 trillion. Right. And it's $1 trillion worth of installed, $250 billion a year. We sell an entire data center in parts. And so our percentage of that $250 billion per year is likely a lot, a lot, a lot higher than somebody who sells a chip. It could be a GPU chip or a CPU chip or a networking chip. That opportunity hasn't changed from before. But what NVIDIA makes is an accelerated computing platform, data center scale. Okay. And so our percentage of $250 billion will likely be higher than the past. Now, second question, how sustainable is it?

There are two answers for that. One reason that you buy NVIDIA is for AI. If you would just build TPUs, if your GPU is only used for one application, then you have to hang your hat on 100% of that. What can you monetize of AI today? Token generation returns. However, if your value proposition is that AI, token generation, but that AI training the model, and very importantly, reducing the cost of computing, accelerated computing, sustainable computing, energy-efficient computing, that's what NVIDIA does for a living at its core. It's just that we did it so well that generative AI was created. Okay. And now people forgot that it's a little bit like our first application was computer graphics, and the first application was games. We did that so well, and we did it so passionately. People forgot we were an accelerated computing company.

They thought, "Hey, you're a gaming company." A whole generation of young people grew up. Once they used RIVA 128, and they went to college with GeForce, and then when they finally became an adult, they thought you were a gaming company. We do accelerated computing so well. We do AI so well. People think that that's all we do. But accelerated computing is $250 billion a year. $250 billion a year should go to accelerated computing with or without AI, just for the sake of sustainable computing, just to process SQL, which is, as you guys know, one of the largest consumptions of computing in the world. Okay. I would say $250 billion a year should go to accelerated computing no matter what. Then on top of that is generative AI. How sustainable do I think generative AI is going to be?

You know how I feel about it. I think we're going to be generating words, images, videos, proteins, chemicals, kinetic action, manipulation. We're going to be generating forecasts. We're going to be generating bill plans. We're going to be generating bill of materials. We're going to be generating list goes on.

Stacy Rasgon

Managing Director & Senior Analyst, Bernstein Research

Hi, Jensen. Colette, thanks. It's Stacey Rasgon, Bernstein Research. I wanted to ask about the interplay between CPUs and GPUs. Most of the benchmarks, if not all of them, that you showed yesterday were really around the Grace Blackwell system that had, I guess, two GPUs and one CPU, sort of doubled the CPU per GPU ratio versus Grace Hopper. You didn't talk a lot about benchmarks relative to the standalone GPUs. Is this a shift? Are you guys looking for much more CPU content, I guess, in these AI servers going forward? And then how do I think about the interplay between the ARM CPUs that you're developing and x86? Seems like you're putting a little less emphasis on the x86 side of things going forward.

Jensen Huang

Founder and CEO, NVIDIA

Yeah. Stacey, appreciate the question. There's actually zero concern about either one of them. I think x86 and ARM are both perfectly fine for data centers. There's a reason why Grace is built the way it is. Grace is built in such a way, the benefit of ARM is that we could mold the NVIDIA system architecture around the CPU so that we could create this thing called chip-to-chip, the NVLink that connects between the GPU and the CPU. We could make the two sides coherent, meaning when the CPU touches a register, it invalidates the same register on the GPU side. As a result, the two sides can work together on one variable coherently. You can't do that today between x86 and peripherals. So we were able to solve some problems that we couldn't solve otherwise.

As a result, Grace Hopper is insanely great for CAE applications, which is multiphysics. Some of it is running on CPUs. Some of it is running on GPUs. It's insanely great for different combinations of CPU and GPUs so that we could have very large memories associated with each, maybe one GPU or two GPU coherently. And so we can solve some of these problems. Data processing, for example, is insanely great on Grace Hopper. Okay. And so it's just harder to solve, not because the CPU itself, but because we couldn't adapt the system. Second, the reason why I showed—I will say that there was one chart where I showed Hopper versus Blackwell on x86 systems, B100, B200, and then also GB200, which is the Grace Blackwell. The benefit of Grace Blackwell in that case wasn't because the CPU is better.

It's because in the case of Grace Blackwell, we were able to create a larger NVLink domain. That larger NVLink domain is really, really important for the next generation of AI. The next three years, the next three, five years, which is as far as we could see right now, if you really want a good inference performance, you're going to need NVLink. That was the message I was trying to deliver. We're going to talk more about this. It's abundantly clear now, these large language models, they're never going to fit on one GPU. Okay. That's not the point anyways. In order for you to be sufficiently responsive and have high throughput to keep the cost down, you need a lot more GPUs than what you even fit in.

And in order to have a lot of GPUs working together without the I/O overhead getting in the way, you need NVLink. NVLink's benefit in inference, everybody always thought NVLink's benefit is in training. NVLink's benefit in inference is off the charts. That's the difference between 5x and 30x. That was another 6x. It's all NVLink. NVLink and the new Tensor Core. Excuse me. Yeah. Okay. And so Grace gives us the ability to architect the system exactly as we need it, and it's harder to do it with x86. That's all. But we support both. We'll have two versions of both. And in the case of B100, it just slides into where H100 and H200 goes into. And so the transition for Hopper to Blackwell is instantaneous.

The moment it's available, you just slide it in, and then you can figure out what to do about the next data center. Okay. So we get the benefit of extremely excellent performance at its limit of the architecture as well as easy-peasy transition.

Stacy Rasgon

Managing Director & Senior Analyst, Bernstein Research

Thank you.

Matt Ramsay

Senior Semiconductor Analyst, TD Cowen

Hey there. It's Matt Ramsay from TD Cowen. Hey, Jensen. Colette, thank you. Good morning for doing this. I wanted, Jensen, for you to comment on a couple of topics that I've been noodling on, one of which is NIMs that you guys talked about yesterday. It seems like a vertical-specific accelerant for people to get into AIE and onboard customers more quickly. I wonder if you could just give us an overview of how your company is going at broader enterprise and just what different vehicles there are for people to onboard into AI. The second topic is on power. My team's been spending a good bit of time on power. I'm trying to decide if I should spend more time there or less. Some of the systems you introduced yesterday are up to 100 kW or more.

I know that scale of computing couldn't be done without the integration that you guys are doing. But also we're getting questions on power generation at the macro level, power delivery to the cabinet at that density. I just would love to hear your thoughts about how your company is working with the industry to power these systems. Thanks.

Jensen Huang

Founder and CEO, NVIDIA

Okay. I'll start with the second first. Power delivery, 100 kW, as you know, for computers a lot. But 100 kW is a commodity. You guys know that, right? The world needs a lot more than 120 kW. And so the absolute amount of power is not an issue. The delivery of the power is not an issue. And the physics of delivering the power is not an issue. And cooling, 120 kW is not an issue. We can all agree on that. Okay. And so none of this is a physics problem. None of this requires invention. All of it requires supply chain planning. Makes sense? So that's the way that and how big of a deal is supply chain planning? A lot. I mean, we take it very, very seriously. And so we think about supply chain planning all the time.

You got to go. The reason why we have great partnerships with—if you go, I think if you look up Vertiv, I think the front page is a paper that we wrote together. So, Vertiv, NVIDIA engineers working on cooling systems. Okay. And so, Vertiv is very important in the supply chain of designing liquid-cooled and otherwise data centers. We have great partnerships with Siemens. We have great partnerships with Rockwell, Schneider, for all good reasons. This is exactly the same as having great partnerships with TSMC and Samsung and SPIL and Wistron and so on and so forth. And so, we're going to have to go. Our company supply chain relationships are quite broad and quite deep. And the fact that we build our own data centers really helped that. We've been building supercomputers now for quite some time. This is not our first time.

Our first supercomputer was DGX-1 in 2016. That kind of puts in perspective. We build one every year. This year, we're building several. So the fact that we're building it gives us tactile sensation of who we're working with, who are the best. We do it for that very reason, one of the reasons for that. NIMs, there are two onboards, two ways to onboard into enterprise. There's the most impactful way, and then there's the other way. Okay. They're both important. I'll start with the other. The other way is that we're going to create these NIMs. We're going to put it on our website, and we're going to go through GSIs and a lot of solution providers. They're going to help companies turn these NIMs into applications. That's going to have a whole thing. That's going to have a whole thing. Okay.

That go-to-market includes large GSIs and smaller specialized GSIs and so on and so forth. Okay. We have lots of partnerships in that area. The other area that I think is really quite exciting, and I think that this is really where big action is going to happen, is the $1 trillion of enterprise companies in the world. They create tools today. In the future, they're going to offer you tools plus co-pilots. Remember, the single most pervasive tool in the world is Office. They're now co-pilots for Office. There's another tool that is super important to NVIDIA, Synopsys, Cadence, ANSYS. We would like to have co-pilots for all of them. Notice we're building co-pilots for our own tools. We call them ChipNeMo. ChipNeMo is super smart. ChipNeMo now understands NVIDIA Lingo, NVIDIA ChipTalk, and it knows how to program NVIDIA programs.

Every engineer that we hire, the first thing we're going to tell them is, "Here's ChipNeMo. And then there's the bathroom. And then there's the cafeteria." In that order. They'll be productive right away. While they're eating lunch, ChipNeMo could be doing some stuff. That just gives you an example. But we have co-pilots that are being built on top of our own tools all over the place. Most companies probably can't do this, and we can teach the GSIs to do this. But in the area of these tools, Cadence and others, they're going to build their own co-pilots. And they will rent them out, hire them out as engineers. I think they're sitting on a goldmine. SAP is going to do that. ServiceNow is going to do that. And they're very specialized co-pilots.

They understand languages like in the case of SAP, ABAP, isn't that right? Which is a language that only an SAP lover would love. As you know, ABAP is a very important language for the world's ERP systems. Every company runs on it. We use ABAP. So now they have to go create a ChatABAP. And that ChatABAP will just like ChipNeMo or ChatUSD that we created for Omniverse. So Siemens will do that. Rockwell will do that, so on and so forth. Does that make sense? That, I think, is another way you get to enterprise. ServiceNow is going to do that. They have lots and lots of co-pilots that they're building. That's how they can create another industry on top of their current industry. It's almost like an AI workforce industry. Yeah.

I'm super excited about the partnerships we have with all of them. I'm so excited for them. Every time I see them, I tell them, "Anirudh, you're sitting on a goldmine. Sassine, you're sitting on a goldmine." I'm so excited for them.

Tim Arcuri

Managing Director, UBS

hi, it's Tim Arcuri at UBS. I had a question also about the TAM. It's more greenfield versus brownfield because up until now, H100 was pretty much all greenfield. So people weren't taking A100s and ripping them out and replacing them with H100s. Could B100 be the first time where you see some brownfield upgrades, where we go in and we rip out A100s and we replace them with B100s so that maybe the TAM, if the $1 trillion goes to $2 trillion, you have a four-year replacement cycle. You're talking about $500 billion. But much of that growth comes from upgrading the existing installed base. I wonder if you can comment on that.

Jensen Huang

Founder and CEO, NVIDIA

Yeah. Really good question. Today, we are upgrading the slowest computers in the data center, which would be the CPUs. And so that's what should happen. And then eventually, you'll get around to the Amperes, and then you get around to the Hoppers. I do believe that in five, six, seven, eight years, we're going to be in pick your year out there. I'm not picking one. I'm just saying in the outer years, you're going to start seeing replacement cycles, obviously, of our own infrastructure. Yeah. But I wouldn't think that that's the best utilization of capital at the moment. Amperes are super productive, as you know.

Brett Simpson

Partner and Co-Founder, Arete Research

Yeah. Hi, Jensen. It's Brett Simpson here at Arete Research. Thanks for hosting a great event this last couple of days. My question was on inference. I wanted to get your perspective on you put up some good performance numbers with the B100 in terms of how inference compares with H100. What's the message you're giving to customers on cost of ownership around this new platform? And how do you think it's going to compare with ASICs or other inference platforms in the industry? Thank you.

Jensen Huang

Founder and CEO, NVIDIA

I think for large language models, Blackwell with the new transformer engine and NVLink is going to be very, very, very hard to overcome. The reason for that is the dimensionality of the problem is so large. TensorRT- LLM, this exploration tool, this optimization compiler that I talked about, the architecture underneath the Tensor Cores are programmable. NVLink allows you to connect a whole bunch of GPUs working in tandem with very, very low overhead, basically no overhead. Okay. So as a result, 64 GPUs is the same as one programmatically. It's incredible. So when you have 64 GPUs without overhead, without this NVLink overhead, if you have to go over to network like Ethernet, it's over. You can't do it. You just wasted everything. Because they all have to communicate with each other, it's called all-to-all.

Whenever all have to communicate each other, the slowest link is the bottleneck, right? It's no different than having a city on one side of the river, having a city on the other side of the river, that bridge, that's it. That's the throughput. That defines the throughput. Okay. And that bridge will be Ethernet. On one side is NVLink. On the other side is NVLink, Ethernet in the middle. It makes no sense. So we had to turn that into NVLink. And now we have all of the GPUs working together, generating tokens one at a time. Remember, the tokens cannot be. It's not as if you splat out a token because the transformer has to generate the tokens one at a time in sequence. And so this is a very complicated parallel computing problem. Okay.

And so I think Blackwell has raised the bar a lot, just mountains, utterly mountains, ASIC or otherwise.

CJ Muse

Senior Managing Director, Cantor

Thanks, George. Hello, Jensen. Let C.J. Muse with Cantor. Thank you for hosting this. It's great to see you both. Question on your pricing strategy. Historically, you've talked about the more you buy, the more you save. But it sounds like initial pricing on Blackwell is coming in at perhaps maybe a lower premium than the productivity that you're offering. So curious, as you think about maybe razor, razorBlade, and selling software and the full system, how that might cause you to kind of evolve your pricing strategy and how we should think about kind of normalized margins within that construct. Thank you.

Jensen Huang

Founder and CEO, NVIDIA

The pricing that we create always starts from TCO. I appreciate that comment, CJ. We always come from TCO. However, we also want to have the TCO not of the main body of customers. And so when you only have one particular domain of customers, let's say molecular dynamics, then if it's only one application, then you set the TCO based on that one application. It could be a medical imaging system. And all of a sudden, the TCO is really very, very high, but the market size is quite small. In every single generation that goes by, our market size is growing, isn't that right? And we want to make the entire market be able to afford Blackwell. And so in a way, it's kind of a self-curing problem.

As we solve for the TCO for a much larger problem, larger market, then some customers would get too much value, if you will. But that's okay. But you're making the business simpler, having one basic product, and you're able to support a very, very large market. Now, over time, if the market were to bifurcate, then we can always segment. But we're nowhere near that today. And so I think we have the opportunity to create a product that delivers extraordinary value for many and extremely good value for all. And that's our purpose. Okay. Yeah.

Joe Moore

Managing Director, Morgan Stanley

Hi. Joe Moore from Morgan Stanley. It seems like the most impressive specs that you showed were around GB200, which you just described as a function of having that bigger NVLink domain. Can you contrast what you're doing with GB200 with what you did with GH200 and why you think it could be a much bigger product this time around?

Jensen Huang

Founder and CEO, NVIDIA

Oh, great question. The simple answer is GH200, H100, H200. Grace Hopper, before it could really take off significantly, Grace Blackwell's already here. And Grace Hopper had the additional burden that Hopper didn't have. Hopper fit right into where Ampere left off. A100s went to H100s. They're going to go to B100s, so on and so forth. And so that particular chassis or that particular use case is fairly well established, and we'll just keep on moving. The software is built for it. People know how to operate it, so on and so forth. Grace Hopper's a little different. And it addressed a new class of applications that we didn't address very well before. And I was mentioning some of it earlier.

Multiphysics problems where the CPU and GPUs have to work closely together, very large data sets, so on and so forth, difficult to parallelize, for example, those kinds of problems, Grace Hopper was really good for. Okay. And so we started developing software for that. My recommendation for most customers is at this point, just gear up for Grace Blackwell. And I have given them that recommendation. And so everything that they do with Grace Hopper will be completely architecturally compatible. That's the wonderful thing. And so whatever they have, whatever they buy, is still fantastic. But I would recommend that they put all their energy into Grace Blackwell because it's so much better.

Speaker 14

Jensen Huang, thanks for having us here today. I wanted to ask a question on robotics. It seems like every time we come back to GTC, you sneak something in at the end. In a couple of years, we go, "Wow, he's been talking about that for a while." I heard this week you guys mention that robotics may be getting close to its ChatGPT moment. Can you describe what that means and where you start to see that robotics evolution in kind of our day-to-day lives? That'd be super helpful. Thank you.

Jensen Huang

Founder and CEO, NVIDIA

Okay. Several things. First of all, I appreciate that. I showed Earth-2 two years ago. And two years later, we have this new algorithm that is able to do regional weather prediction at 3 km. The supercomputer you need to do that is 25 times larger, excuse me, 25,000 times larger than the one that you currently use to do weather simulations at NOAA in Europe and so on and so forth. 3 km resolution is very, very high resolution, if you will, right above your head. Okay. And weather simulation also requires a whole lot of what is called ensembles because the world is chaotic. And you want to simulate a lot of distribution, sample a lot of different parameters, a lot of different perturbations, and try to figure out what is that distribution and that the middle of that distribution likely is going to be the weather pattern.

Well, if it takes that much energy just to do it one time, they're not going to do it more than one time. But in order to predict where weather's going to be a week from now, especially extreme weather that can change so dramatically, you're going to need a lot of what they call members, a lot of ensemble members, a lot of samplings. And so we're basically doing weather simulation 10,000 times. Okay. And because we trained an AI to understand physics, and it's physically plausible, and it can't hallucinate, and so it has to understand the laws of physics and such. And so two years ago, I showed it today, and we connected into the most trusted source of weather in the world, The Weather Company. Okay. And so we're going to help people do regional weather all over the world.

If you're a shipping company and you need to know weather conditions, if you're an insurance company, you need to know weather conditions. If you're in the Southeast Asia region, you have so many hurricanes and typhoons and things like that, you need some of this technology. So we're going to help people adapt it for their region and their use case. Well, I did that a couple of years ago. The ChatGPT moment kind of works like this. Take a step back and ask yourself, "What happened with ChatGPT?" The technology is insanely great. Okay. It's really incredible. But there are several things that happened. One, it learned from a whole lot of human examples. We wrote the words, right? It was our words. So it learned from our human examples, and it generalized it. So it's not repeating back to words.

So it can understand the context, and it can generate original form. It understood the context, meaning that it adapted to itself, okay, or it adapted to the current circumstance, the context. And then the third thing is it could now generate original tokens. Now, I'm going to take everything back into tokens. Forget words, just tokens now. Use all the same words that I just used, but replace words with tokens. If I could just figure out how to communicate with this computer, what this token means, okay, if I can just tokenize this, just as when you do speech recognition, you tokenized my sound, my voice. Just as when we reconstructed proteins, we tokenized amino acids. You can tokenize almost everything you can digitize, a simple way of representing each chunk of the data. Okay. So once you can tokenize it, then you can learn it.

We call it learning the embeddings of it, the meanings of it. And so if I can tokenize motion, okay, the world, and I can generalize and I can tokenize articulation, kinematics, and I can learn and generalize it and then generate, I just did the ChatGPT moment. How's it any different? The computer doesn't know. Now, of course, the problem space is a lot more complicated because it's physical things. So you need this thing called alignment. And what was the great invention of ChatGPT? Reinforcement learning, human feedback, alignment, isn't that right? So it would try something, and you say, "No, that's not as good as this." It would try something else, and you say, "Nope, that's not as good as this." Human feedback, reinforcement learning. And it takes that reinforcement and improves itself. And so what is Omniverse for?

Well, if it's in a robot, then how would you do feedback? And what is feedback about? It's physical feedback, physics feedback. It generated a movement to go pick up a cup, but it tipped a cup over. It needs reinforcement learning to know when to stop. Does that make sense? And so that feedback system is not human. That feedback system is physics. And that physics simulation feedback is called Omniverse. And so Omniverse is reinforcement learning, physical feedback, which grounds the AI to the physical world, just as reinforcement learning, human feedback, grounds the AI to human values. Are you guys following me? I just described two completely different domains using exactly the same concepts. And so what I've done is I've generalized general AI. And by generalizing it, I can reapply it somewhere else.

We made this observation some time ago, and we started preparing for this. Now you're going to find that Isaac Sim, which is a gem on top of Omniverse, is going to be super, super successful for just about anybody who's doing these robotic systems. We've created the operating system for robots. I'm sure there's a corporate answer for all the questions you guys ask, but unfortunately, I only know how to answer it the one geek way.

Atif Malik

Managing Director, Citigroup

Hi. I'm Atif Malik from Citigroup. I have a question for Colette. Colette, in your slides, you talked about availability for the Blackwell platform later this year. Can you be more specific? Is that the October quarter or the January quarter? And then on the supply chain readiness for the new products, is the packaging, particularly on the B200, CoWoS-L, ready? And how are you getting your supply chain ready for the new products?

Colette Kress

CFO, NVIDIA

Yeah. So let me start with your second part of the question, talking about the supply chain readiness. That's something that we've been working well over a year, getting ready for these new products coming to market. We feel so privileged to have the partners that work with us in developing out our supply chain. We've continued to work on resiliency and redundancy. But also, you're right, moving into new areas, new areas of CoWoS, new areas of memory, and just a sheer volume of components and complexity of what we're building. So that's well on its way, and we'll be here for when we are ready to launch our products.

So there is also a part of our supply chain, as we talked earlier today, talking about the partners that will help us with the liquid cooling and the additional partners that will be ready in terms of building out the full of the data center. So this work is a very important part to ease the planning and the processing to put in all of our Blackwell different configurations. Going back to your first part of the question, which is when, when do we think we're going to come to market? Later this year, late this year, you will start to see our products come to market. Many of our customers that we have already spoken with, talked about the designs, talked about the specs, have provided us their demand desires.

That has been very helpful for us to begin our supply chain work, to begin our volumes and what we're going to do. It's very true, though, that on the onset of the very first one coming to market, there might be constraints until we can meet some of the demand that's put in front of us. Hope that answers your question.

Jensen Huang

Founder and CEO, NVIDIA

Yeah. Yeah, that's right. And just remember that Hopper and Blackwell, they're used for people's operations. And people need to operate today. And the demand is so great for Hoppers. Most of our customers have known about Blackwell now for some time, just so you know. Okay. So they've known about Blackwell. They've known about the schedule. They've known about the capabilities for some time. As soon as possible, we try to let people know so they can plan their data centers and notice the Hopper demand doesn't change. And the reason for that is they have operations they have to serve. They have customers today. And they have to run the business today, not next year. Okay.

Pierre Ferragu

Managing Partner, New Street Research

Pierre Ferragu, New Street Research. So a geeky question on Blackwell and the two dies and the 10 TB between the two dies. Can you tell us about how you achieve that, how much work you've put over the years into being able to achieve that technically from a manufacturing standpoint, and then how you see the future in your roadmap looking further away? Do you think we're going to see more and more dies getting together into a single package? So that's one side of my question, which is more on the chip and the architecture. And the other side is you must be seeing all these models that are, like Sam Altman said, behind the veil of ignorance. And so can you tell us about what you see and how you see the next generation of models influencing your architecture?

What's the direction of travel for GPU architecture for data center AI?

Jensen Huang

Founder and CEO, NVIDIA

Yeah. I'll start with a second. This is one of the great things about being the platform where all AI research is done. So we get the benefit of seeing everything that's coming down the pike. Of course, all next-generation models are intended to push the limits of current-generation systems to its limit. So large context windows, for example, insanely large context windows, state-space vectors, synthetic data generation, essentially models talking to themselves, reinforcement learning, essentially AlphaGo of large language models, tree search. These models are going to have to learn how to reason and do multipath planning. So instead of one shot, it's a little bit like us thinking we have to work through our plan. That planning system, that reasoning system, multi-step reasoning systems could be quite abstract, and the path could be quite long, just like playing Go.

But the constraints are much, much more difficult to describe. And so this whole area of research is super, super exciting. The type of systems that we're going to see in the next several years, couple, two, three years, is unimaginable compared to today for the reasons I described. There is some concern about the amount of internet data that's available for training these models, but that's just not true. 10 trillion tokens is great. But don't forget synthetic data generation, models talking to each other, reinforcement learning. The amount of data you're going to be generating, it's going to take two computers to train each other. Today, we have one computer training on data. Tomorrow, it's going to be two computers, right? Don't forget. Remember AlphaGo? It's multiple systems competing against, playing against each other, okay, so that we could do that as quickly as possible.

And so some really exciting, groundbreaking work around the corner. All right. The one thing that we're certain is that the scale of our GPUs, they want to be even bigger. The SerDes of our company is world-class. NVIDIA's SerDes are absolutely the world's best. The data rate and the energy consumed, the data rate, the picojoule per bit, the picojoule per bit in our company is unbelievably good. It is the reason why we're able to do NVLink. Remember, NVLink was because we could not make a chip big enough. And so we connected eight of them together. This was in 2016. We're on NVLink Gen 5. The rest of the world doesn't even have NVLink Gen 1 yet. NVLink Gen 5 allows us to connect 576 chips together. They are together, as far as I'm concerned. The data center is so big.

Does it have to be this close together? No, not at all. And so it's okay to split them up 576 ways. And the SerDes are so low energy anyways. Now, we could make even closer chips. Now, the reason why we want that is because the software cannot tell the difference. When you break up chips, the algorithm should be build the largest chip that lithography can make and then put multiple of them together, whatever technology is available to do so. But you start by building the largest chip ever. Otherwise, why wouldn't we do multi-chip back in the old days? We just kept pushing monolithic as far as and the reason for that is because the data rate on chip and the energy on chip allows for the programming model to be as uniform as possible.

You don't have these things called, speaking of geeking out, NUMA, non-uniform memory access, right? So you don't have NUMA behavior. You don't have weird cache behavior. You don't have memory locality behavior, which causes the programs to work differently depending on the nodes, the systems they run on. We want our software to run exactly the same wherever they are. And so you start with the biggest chip possible. That's the first Blackwell die. We connect the two of them together. The technology, 10 TB/s, is insane. Nobody's ever seen 10 TB/s link before. That's 10 TB/s. And it obviously consumes very little power. Otherwise, it would be nothing but that link. And so you had to solve that, number one. The second thing you had to solve was the question that came up before was CoWoS.

It's the largest CoWoS in the world because the first-generation CoWoS was already the largest CoWoS in the world. Now, the second generation is even larger. The benefit that we have is we're not surprised this time. The volume ramp demand happened fairly sharply last time. But this time, we've had plenty of visibility. And so Colette's absolutely right. We've worked with the supply chain, worked with TSMC very closely. We are geared up for an exciting ramp.

Aaron Rakers

Technology Analyst, Wells Fargo

This will be the last question then.

Jensen Huang

Founder and CEO, NVIDIA

Bummer.

Aaron Rakers

Technology Analyst, Wells Fargo

Wow. Thank you.

Jensen Huang

Founder and CEO, NVIDIA

Come on.

Aaron Rakers

Technology Analyst, Wells Fargo

Aaron Rakers at Wells Fargo. I really appreciate all this detail. I'm actually going to dovetail off this last comment because today you started the conversation by talking a little bit about Ethernet and how Ethernet with Ultra.

Jensen Huang

Founder and CEO, NVIDIA

I love Ethernet.

Aaron Rakers

Technology Analyst, Wells Fargo

Yeah. So I want to understand a little bit NVLink, 576 GPUs now interconnected together, this idea of the fabric architecture. Where does that play relative to the evolution of Ethernet, your Spectrum-X product, this move to 800 gig? I'm just trying to understand the interplay between those and whether or not you see NVLink competing with Ethernet in those environments.

Jensen Huang

Founder and CEO, NVIDIA

No. First, the algorithm is actually very simple. First, build the largest die you possibly can, so big that if you added one more transistor, it would literally fall on the ground. That's algorithm number one. And look at the chips that we build. They're literally the largest, their radical limits. Number two, if possible, connect the two of them, connect two of them together. You're not going to connect four of them together. That's not going to happen. But if you can, connect two of them together. And that's the Blackwell invention. We now know how to build dies that big. But beyond that, you're going to have all kinds of weird NUMA effects and locality effects. You might as well go to NVLink. And so once you get to NVLink, the question is, and of course, we're in Gen 5.

If you don't have NVLink, then you're kind of stuck. Okay? You can't build systems like this. But if you have NVLink, then the next part is build NVLink as large as you can, modulated by power and cost. And that's the reason why NVLink is direct connect, it's direct drive, not because optical transceivers are out of fashion. Optical, you kidding me? We love optical. We need optical. We're going to use tons of optical. But you should build the NVLink as large as you can using copper because you could save a lot of power. You could save a lot of money. You can make it scalable, sufficiently scalable. Now you've got one giant chip, 576 GPU chip effectively. But that's only 576 GPU chips. That's not enough. And so we're going to have to connect multiple of them.

The next click after that, the best thing you have is InfiniBand. The second best you have is Ethernet with an augmented computing layer on top of it we call Spectrum-X so that we can control the traffic that's in the system so that we don't have these long tails. Remember, as I said, the last one to finish determines the speed of the computer. This is not an average throughput. This is not like all of us individually are accessing hyperscale and our average throughput is good enough. This is literally the last person who finishes that partial product, who finishes that tensor. Everybody else is waiting on them. I don't know who it is in this room that's going to be the last, but we're going to hope that that person doesn't hold up, right?

And so we're going to make sure that that last one is we push everything to the middle. We only want one answer. It all shows up at the right time. Okay? And so that's the second best. And then you scale that out as much as you can. And that's going to need optics and so on and so forth. Yep. There's a place for all of it. There's a place for all of it. I think if anybody's concerned about optics, don't be concerned. I think the demand for optics is very, very high. Demand for repeaters is very, very high. We didn't change anything about that. All we did was we made computers larger. We made GPUs larger. Can we take one more question? This is so much fun.

Speaker 15

One last question from the buy side, Jensen. You've talked a lot about oh, I'm sorry.

Jensen Huang

Founder and CEO, NVIDIA

There is. Oh, there is. Hey, Will.

Speaker 15

Yeah. Hey.

Jensen Huang

Founder and CEO, NVIDIA

I will.

Speaker 15

Sovereign AI, is there a way to sort of understand what you're going to do for the United Arab Emirates? That would be one question. And I guess my second question is, I'm going to go home. I'm going to see my 91-year-old mother. How can I try to explain to some 91-year-old what accelerated computing? I guess I've got a good answer to the first question. I'll figure out the second one. Thanks.

Jensen Huang

Founder and CEO, NVIDIA

Okay. Yep. I don't know what you were going to say on the second one, but on the second one, I would say use the right tools for the right job. Yeah. And right now, general-purpose computing, you're using the same tool for every single job. Literally, what you have is a screwdriver, and you're using it from the moment you woke up to the moment you go to bed. And so you start with you brushing your teeth with a screwdriver. It probably works. I haven't tried it, but it probably works. And so you just use that one tool the whole day. Now, of course, because you're going to use that one tool for the whole day, over time, humans have gotten pretty smart. And so we made that general-purpose tool. And so now the screwdriver has brushes on it. It's got hair on it.

It becomes useful for all kinds of stuff. You could also use it to clean the bathroom and all that kind of stuff. So one tool. Was that the answer you were going to give? All right. We created basically two tools. We said the CPU is incredibly good at sequential things, and what it's not good at is parallel things. Now, the weird thing is this. For most applications, let's say Excel, the parallel part is not very much. That's the reason why CPUs are really the best processor for Excel, for your web browser, except for graphics that we came along later. Most web browsers are largely single-threaded. Okay? Java is largely single-threaded. For many applications, personal computing is largely single-threaded, and CPU is really quite ideal.

And then all of a sudden, there's this new application that came along, computer graphics, video games, where literally 1% of the code is 99% of the runtime. Do you guys understand what I'm saying? 1% of the code is 99% of the runtime. And the reason for that is because it's computing the pixels one at a time. So 1% of the code is 99% of the runtime. And we said, "Ha-ha. Look at that. How interesting. Why don't we go create something that's insanely good at 1% of the runtime, meaning it's bad at 99% of the runtime? Excuse me, bad at 99% of the code. It's good at 1% of the code." And we just go create applications or find applications where that 1% of the code is 99% of the runtime: molecular dynamics, medical imaging, seismic processing, artificial intelligence. Make sense?

That's why accelerated computing, data processing, so on and so forth, where 1% of the code is 99% of the runtime. And that's the reason why we get such great speed up. All right. Your.

Colette Kress

CFO, NVIDIA

Sovereign AI.

Jensen Huang

Founder and CEO, NVIDIA

Sovereign AI. Every country has their own natural resource, and that natural resource is called their intelligence. It's in their language. India has their own language. They have many of them, lots of different dialects. They have their own language, their sensibility, their culture, their history. It belongs to them. It belongs to them. It belongs to them. And a lot of it is in their national archives and is digitized. It's not actually on the internet. It belongs to them. They ought to take that and go create their own sovereign AI. And they believe the same. Sweden is the same way. Japan is going to do the same. You name it.

Countries all over the world realize that this is their natural resource, and they shouldn't let it just be used by anybody to then import their natural resource back to them in an automated way by paying somebody else. Don't let their data go out for free and import AI. They now realize it ought to be the other way around, that they should keep their own data and then export AI. And so export the AI of Korea, export the AI of Malaysia, export the AI of, you name it, Middle East countries. And so we will have export control limitations on our products. And in most of the areas, the answer is it's not export controlled. And if there's any export control, we can still work with the U.S. government and make sure that the export is going to be fine.

But number one, just make sure that we're compliant with export control. And in some countries, we have to offer degraded products or I didn't say that right, lower specification products. But anyways, number one, just be compliant with export controls and help countries around the world to be able to do this. It's a very big market. Yeah. It's a very big market. There are going to be AIs that are going to be trained and continuously refined for just about every culture in the world. Yep. Thank you. Do you guys know? No. Thank you very, very much. Colette and I appreciate all of your support and interest in the company. And this is really quite an extraordinary time.

It is not usual that we get to live through a time like this where the single most important instrument of society is being reinvented after 60 years, that a new way of doing software has emerged, and you know that software is one of the most important technologies that humanity has ever created, and that you're in the beginning of a new industrial revolution. And so the next 10 years, you definitely don't want to miss. All right? Thank you very much.