Get started. Good morning, everyone. Thank you so much for joining us on day two of the BofA Securities global technology conference. I'm Vivek Arya. I cover semiconductors and semiconductor equipment here at BofA. I'm absolutely delighted and honored to have Ian Buck, the head of accelerated computing at NVIDIA, join us for this keynote. I think most of you are probably familiar with Ian, but if not, Ian heads all the hardware and software product lines, third-party enablement, and marketing activities for GPU computing at NVIDIA. He joined the company in 2004. Same year I joined Merrill Lynch. I guess that's the only thing we have in common, I believe. He created CUDA, which remains the established leading platform for accelerated parallel computing. Before joining NVIDIA, he was a development lead on Brook, which is a forerunner to generalized computing on GPUs.
We are absolutely thrilled to have Ian with us. Before I get into the Q&A, I was just asked to read a brief statement. As a reminder, this presentation contains forward-looking statements. Investors are advised to read NVIDIA's reports filed with the SEC for information related to risks and uncertainties facing their business. With that, very warm welcome to you, Ian. Really appreciate having you. This is, I think, our third keynote session. Really appreciate you joining us.
Yeah, we're running into AI time. I have a year ago, feels like a lifetime.
Lifetime.
One of the most challenging parts of my job often is to try to predict the future, but AI is always surprising us.
That's right. Bigger and better. Ian, let's just start with the big news that kind of rocked at least Wall Street early this year, which was the DeepSeek moment.
Right.
How much of that news was a surprise to you, right? Because you have followed the industry for a long time. What does it really mean for investors who are looking at that as some big seminal game-changing moment? What are the positive and negative implications of that DeepSeek moment from your perspective?
It was, there are a couple of inflection points in AI, for sure. You can go back to the original Google cat moment where AI recognized cats. You can go through the ResNet moment. You can go through the ImageNet moment. In 2022, we had the ChatGPT moment, which I'm sure the investor community all noticed as well. In January, we had the DeepSeek moment. DeepSeek itself wasn't a surprise. I think the company and DeepSeek and High Flyer have been around for a while. I think if you look at the history of the papers they've been publishing, it is amazing work. Actually, they're one of the best CUDA developers out there in terms of getting all the way down.
If you read that DeepSeek R1 paper and the V3, which it was based on, the amount of optimization that they've done for GPUs, for NVLink, for GPU Direct RDMA, for sending data across from the GPU over PCIe to the NIC over NVLink to build a training and inferencing platform and solution and technology, it's truly amazing. The moment, though, that really activated it was reasoning. It was the first open world-class reasoning model. It was truly open. They explained how they built, how they trained it, and the optimizations that they did to make it, to train it at the level of intelligence and optimize the execution of the training and inference stack. There are some amazing graphs in that paper that taught it. It basically, it was a barn door moment for reasoning models and AI.
Today, I think the world would agree, you can't really publish or celebrate a new model without it being a reasoning model. Reasoning wasn't new. OpenAI had been publishing papers about using reasoning. O3 and o4, many excellent, and Gemini, all reasoning models. DeepSeek really made it ubiquitous, open, democratized it. The implications for and the impact was not understood when it got launched. First off, by being open, anyone could run anywhere. Today, DeepSeek R1 is called a dollar per million token, where a traditional LLM like Llama 70B might be $0.60 per million token. It's a big model, 671 billion parameters. I think it's 38 billion active parameters. It has over 120 plus odd layers and 250 experts with a shared expert.
That is like stuff that only folks like Gemini or OpenAI, that level of complexity and technology, you had a truly world-class open model. Running that level of complexity is really hard. What has happened is now that, and what makes reasoning so useful is the fact that your output tokens, you let the model think, you teach the model to think, and really kind of think out loud. If you've ever used DeepSeek R1, it's quite amusing to watch it think. It actually is just talking out loud, asking itself questions. It's actually trained itself to come up with an answer by thinking out loud. It doesn't give you that answer right away. You can see it, it checks the answer. It's taught itself to check the answer and make sure it's right by double-checking its math.
It does not give you the answer again, it checks it a second time. That is very intentional. They actually train the model to think for as long as it can until it comes up with an answer, check it once, check it twice, and then give you the answer. As a result, we are seeing an explosion in the number of tokens generated. You ask Llama a question, you get an answer back at about 100 words. That is it. You pay for those 100 words of, or call it 200 some odd tokens, $0.60. DeepSeek, it reasons for about 1,000 words. Then it gives you that 100-word answer, and it is right. All those tokens you are paying for, by the way, you value at $1.
In general, DeepSeek has kind of made every model, the reasoning model, the inference demand as a result has kind of exploded. The opportunity for multi-GPU, multi-node inference is everywhere. It came actually at a great time for GB200 because of all those GPUs connected with NVLink and Blackwell. You're seeing that now. With the increase in value, even of a free open model like DeepSeek R1 at $1 per million token, it generates about 13 times more tokens, 13x more tokens. That's like 20x more total market opportunity for inferencing because of reasoning. Actually, they just announced a new rev of DeepSeek R1 on the math benchmark. They went from the AME math benchmark. They were getting about 70% accuracy, 69 or 70% accuracy. It's kind of like a C minus. 70%, it's like you're getting two out of three questions right. It's not that great.
The new one they did, they just updated the R1, same model, better weights, same cost, is now 89% accurate. They went to kind of a B plus, which is basically nine out of 10 questions right versus two out of three. The way they did that, they taught the model to think longer. They just doubled the number of tokens they're generating, how much thinking out loud they did. As these models are getting smarter, it is driving more output tokens, more thinking, and more opportunity for token revenue.
You think anything that DeepSeek is doing or what's happening in China as a proxy for, let's call it CapEx-constrained computing, so there is a lot more effort being made to make these things a lot more efficient because they may not have access, do you think they are able to bend the cost curve in a way that has implications on how much spending needs to happen in this industry?
No. Actually, the opposite. They just made it what everyone was doing, they just talked about an academic paper. Computing has always been constrained. Access to compute, amount of compute, dollars of compute, capital expenditure of compute. The AI race is about regardless of how much compute you have, how efficiently you're using it, how intelligently you're using it, and how much value you bring. Everybody wants a Hopper in the ChatGPT moment. That wasn't unique to DeepSeek. It was around the world. It's just, do you have the engineering talent to capitalize on it, to invent, to code your CUDA, know your InfiniBand, know your NVLink, optimize your transformer layer? One of the big innovations that DeepSeek did was they used a new technique called MLA, which actually is a statistical method for approximating the weights and the KV layers of the transformer layer. It wasn't a new idea.
It actually had been deployed in image generation, all those fun draw me a picture of a teddy bear swimming in an Olympic lap. They were using this MLA statistical technique, but it compressed the gizus out of the transformer layer. It made it a lot cheaper by approximating. And they were able to apply it to DeepSeek V3 and R1. That was the first time it had been publicly talked about. Trust me, these methods are being deployed and optimized. Just not everyone wants DeepSeek themselves are doing the world a favor by sharing some of the state-of-the-art research they're doing with the world. But it's happening everywhere. It was happening back in Hopper. It was happening even back in the A100 days as well.
Got it. You talk with a lot of cloud customers. Many of them are developing their frontier models. Are you seeing any kind of saturation or diminishing returns in the size of the benefits from increasing the size of these models? There was this kind of public story about Meta's large language model where they are not getting enough ROI on it. Do you see any saturation in the effectiveness of these models that, again, because what this community cares about is CapEx at the end of the day. Is there anything that is happening from a Western large language model perspective that gives you a pause on how long and how big can Western AI CapEx be?
I would not get too hung up on the behemoth question. Behemoth is an open model. There is a competition in the open space. It is hard to launch a model if it is not world-class. It relates to your brand and what you are doing versus how does it compare to all the models that are out there. What I am seeing right now is the drive toward first reasoning models. They just add so much more value. They are able to think and solve a problem. That is only but based upon two things. One is how much knowledge they know, which is the size of the model, and how good are they at thinking, using that knowledge to come up with an answer to a question. Traditional LLMs simply regurgitated what they knew. Traditional Llama 70B, 70 billion parameters, it was trained on the corpus of the internet.
When you ask it a question, it is really just trying to reconsolidate the information it knows and answer your question, but it can't really think. What the DeepSeek and the other models are doing right now is they take the corpus of the internet, they use that information to think and answer a question. What I'm seeing, and the more they know, the quicker they can think, the more accurate the answer they come up with, or the cheaper their answer is. We have a conflation of taking all of the knowledge that they know and baking it into the model repeatedly. The more questions get asked, the more data now, and the more answers they can invest into the model itself.
We don't need to, like you and I don't need to know that 50 plus 50 is 100, but that's because we just know it. A first grader needs to actually do the math and carry the 1 and make it happen. Once they've done that, it's now part of their inherent knowledge. Think about ChatGPT. Think about Grok. Think about Meta AI. Every time someone's asking a question, they are expanding the corpus of knowledge. They think about that answer. Now that answer gets baked into the model itself, and the models are constantly training and retraining and retraining and retraining. They are both inferring, making money or adding value to the customers, and also being smarter. Their intelligence, how much they know, is strictly the size of their model.
That's why the models, when we were talking last year, 100 billion parameter model plus was a rarity. Now 100 billion is kind of like sort of table stakes, going to 600 billion. Obviously, we have models out there that are in the trillion, but they're not open. That's because they're adding value. There's a benefit to that model being smarter to answer the question quicker or answer more valuable questions even further. The tricks that are happening are the tricks in executing the model. The MOE experts, which is a hard thing to do, actually picking throughout the whole model which parts of that knowledge I should pull from and compute on versus skip is where a lot of the innovation is happening. There's a little bit of this race right now of model size and active parameters. Traditional LLMs, they're not MOE.
They just compute on every piece of knowledge they know. You and I both know that's not very efficient to take all the knowledge you know and process it relative to what my answer is. The question of inference, that's what experts are. They split the model up into little pieces. Throughout the whole thinking path, they're trying to prune and only pull in the right parts. DeepSeek made public what a lot of RE were doing, which is having the experts in every layer of the stack. We kind of are, the models are getting bigger. It's a race between that and the active parameters to answer a question. You're only seeing a small glimpse in the public papers of what the true behind-the-scenes world-class work has actually been able to do.
A year from now, how large of model sizes will we be talking about?
We're already using trillion parameter models today. You just don't know it. The active parameters, it's highly variant. The techniques and every piece of idea that you can use to trim how much compute you use, like you said your previous question, is being applied, researched, figured out. What then happens is that the other way of optimizing for compute is distillation. You take the trillion parameter model, and if you fine-tune, if you limit the use case or limit the application to a vertical or narrow workspace, you can reduce down to a 70 or 7 billion parameter model. There's lots of that. Quick small models like for doing search texts, when you type in your text on your phone, it's expanding the sentence for you.
That's a very small model, which can be finely tuned to you, personalized to you, and what you may be doing at that moment. We see an explosion of vertical models. At Hugging Face right now, I can't remember. If you search for a Llama on Hugging Face, you're going to find bazillions of distilled models. By the way, all those distilled models also need to be computed on, and they're constantly being regenerated. One of the big consumers of GPU is distillation, taking a big model, running inference on it, creating smaller models. They start from a really highly intelligent one, and they distill down. I think we're all getting to trillion parameter models now.
There's talk of when do we get to the 10T and how many active parameters, and what does that model actually look like in terms of the optimization stack is pretty funky.
The next topic Ian would love to get your perspective on is NVIDIA's competitiveness as the world moves to more inference. In that training, I think there is recognition that NVIDIA has done an outstanding job. As we go to inference, there is fragmentation of workloads, optimization, et cetera, et cetera. One of your GPU competitors has added a lot more high bandwidth memory, and they are saying that is better for inference. There is a whole bunch of startups who are promising lower cost per token, et cetera. How do you view NVIDIA's competitiveness when it comes to the inference market? Even if they could compare it against a lot of the ASIC players that are out there.
It's a good question. NVIDIA thrives at things that are hard. We just do. We're an engineering and technology company. I've got a boss who's passionate about solving the hard problems and letting other people make money and innovate on top of what we can provide as a platform. My life is, I want to update my bio. I'm just a platform guy. I'm just constantly building technology platforms to help other people make money. The inference is really hard. It is wickedly hard. It's actually, in many cases, while training is hard for different reasons, trying to do 100,000 GPUs or going to a million GPU distributed training clusters and keeping that thing going at scale is a data center scale, reliability, networking, one giant GPU problem. Inference is a myriad of optimizations.
You start with numerical precision, 32-bit floating point, 16-bit floating point, 8-bit floating point, 4-bit floating point. Just to be, if we can use the opportunity, Blackwell has 20 petaFLOPS of FP4 per GPU or petaFLOPS. That's a lot. The fastest supercomputer in the world is measured in exaflops, which is only 1,000 petaFLOPS. We got that in FP4. Making four bits work and come up with the right answer, you only have four zeros and 1s. That's not a lot of numbers. So that mathematically, numerically, getting an accurate answer by using only that requires expertise in numerical and quantization primitives that are extremely complicated. Go up from there, you have now distributed the model. The model is invented on a single GPU, single piece of silicon. I don't care who you are.
In order to get performance, you have to have multiple chips together to run in parallel within the node. If you're going to do the high-value models, you're going to actually have to run multi-node and connect them all together. You've seen how complex, and we share how complex the GB200 NVL72 type is. On top of that, you have diversity workload. An AI factory is not going to run just one model all day long. It's easy to benchmark that one model. It's easy to optimize for that one model. Certainly, it's easy to build. If you want just one run thing, you could build just, you can tune your architecture for that. AI factories are going to run every kind of models, and the models are going to change. You're buying a billion-dollar AI factory.
You're going to need to capitalize that expenditure for five years. You damn well better make sure that whatever you buy for now, you're going to be able to run and capitalize and create value for five years. The future of AI is go back five years ago. We were launching the first A100. I think I was still talking about ResNet today. That's a really important and strategic investment for companies to make sure that they're building an AI factory that can do all of those optimizations, all those techniques, run all those models today and next year and the year after that all the way out to 2030. That's why the platform is so critical. That's why NVIDIA's got to work with every single, and we do with every single AI company to make sure that our platform is constantly being innovating.
The innovations, we invent, do some of that technology, but the vast majority of it actually comes from all of those companies like OpenAI, like Meta, like the Grok model at XAI, as well as the entire academic community. Amazing innovations come from there. Also DeepSeek. Faster Transformer was a student. He's now a professor at Princeton. Just right there, doubled the transformer performance because he figured out a way to run it more efficiently, more accurately, and with less cost. The inference market is about running every model across all those AI factories now and in the future. It's a fascinating business model where data centers are bought with billions of dollars, five years of CapEx, and you end up charging dollars per hour or millions per token at this anys.
If, let's say, you were the head of AWS, how would you go about making the decision between ASICs or GPUs for your AI factory?
You should ask Matt that question. He's a good guy. I worked with him.
They talk a lot about training, so.
I know. They should, right? I mean, building silicon is hard. Talking to somebody who's been involved with it for 20 years, it's hard and getting even more complicated. It's no small feat to be able to achieve even what they've achieved. I'm super happy. I mean, that's impressive what they've been able to—anyone who's gotten over the survived it and been able to do multiple generations and stuck with it requires almost founder-level CEO commitment to make it happen. Their value is, and every hyperscaler, they're all building their own silicon. They are people, and they're both our customers and also looking at alternatives. They rightly should. Their own and other silicon, other opportunities out there. Each of them have to find what they need to optimize for and what they need to go serve and what they're going to do it for their business.
I can't speak for Matt's business exactly where he's going to be applying all those. Likewise with TPU, they're all looking at—they have an internal workload and an external opportunity. They're all very passionate about making sure they provide time to market the latest NVIDIA GPUs and the customers and workloads that we bring to their clouds. Our business with AWS and with everyone is extremely healthy and continues to grow. AWS launched one of the first launches, actually, the B200 HGX. We talk a lot about NVL72, but the existing B200 HGX platform, which is just eight GPUs NVLink connected, the same architecture that ChatGPT ran on with Hopper, we also do it with Blackwell. It's a fantastic inference platform. It runs all the same Hopper workloads, all on x86. It carried over and immediately provided a 3x boost for inferencing.
Everyone who is on Hopper using H100, H200, HGX, as soon as they get on B300, immediately they're getting a 3x boost. You see that in the artificial analysis benchmarks and everything else in terms of performance. AWS is an excellent partner. How they go and apply and where they see their opportunity, everyone's got to define that niche or that area that they're going to add value with and then how they're going to engage in the community. It's one thing also to win on a benchmark or do a certain workload. It's a whole nother game to try to activate an ecosystem and developers and your platform into the market. Not all of them need to do that, and certainly some have chosen to work on certain opportunities. The undeniable part of it is that we're constantly making things faster. We are lowering costs.
We are making things more profitable as per the DeepSeek and B200 example. We are doing that annually. Each of them have to kind of choose where they're going to provide value or differentiate.
If I ask the question in a different way, which is today, if I look at $100 I spend on AI, $10-$15 of that is going into ASICs. If we go out the next three, four, five years, what makes this $10-$15 go to $20-$25? What do you think would have changed in the industry or can change in the industry to make it tilt more towards ASICs and away from merchant silicon?
You look at the problem wrong because the profitability, your revenue, your performance is actually your profitability, your gross margin. You can look at the cost reduction of a component. Generally, when we look at it, we look at it as terms of there's a billion dollars of AI factory that you're going to generate. How many tokens is it going to output compared to the previous generation and how much more value that those tokens are going to not just in strict same dollar per token, the same model, but if you can deliver 3x more tokens per second, you would pay more for that. The reasoning of the in a reasoning model, you get your answer faster or be able to reason with a certain amount of time. You actually pay a premium for that.
Asking what is 50 plus 50, go away for an hour, come back versus getting it right away is more valuable. It is a little bit the dollars spent on a data center, on chips is actually pretty small. If you actually look at the chip silicon cost or even just the price of the dollars they're spending on the chips versus everything that goes around the chips, it is increasingly a really important part of the value because if AI really isn't or inference and certainly training is because of the value of reasoning and these large models, it's not a single GPU chip business anymore. It's about connecting all those chips together with high-speed signaling and, as a result, liquid cooling to fit them all in one small space so they can all talk to each other at those speeds.
The more you spread them apart, the slower the signals have to travel. That is why liquid cooling brings it all together. The complexity and the value that that brings is driving up the—it's not because we want to spend that much more money or we want to run that fast. It is because the value that we bring with bringing that together drives up the revenue side of it. I think we will always look at previous generation. We will always look at what the opportunity is and what others are able to actually achieve on the basket of workloads that we know is valuable now and what we do our best to predict is going to be valuable in a year or two years' time.
The good news is NVIDIA is always coming up with new GPUs every year now, new architectures every year now, and also optimizing the data center design every year. That makes my job a little easier. I used to have to predict a three-year horizon. Now I can think about now and the future. If we see another opportunity or we get a little bit wrong, we can just keep fixing and fixing and fixing it. In terms of how do ASICs or alternatives play, I think it is going to be basically what niche, what vertical, what workload do they want to optimize for, what use case, and what they want to decide. NVIDIA's goal is not to run every AI model everywhere.
Certainly, what goes in a Ring Doorbell should be what the silicon inside a Ring Doorbell should do or a hockey puck on your kitchen counter or what's inside of your phone and how that wants to work there. Where we're going to focus or I focus on is just at the AI factory for inference and the training clusters at scale. Increasingly, those two things are melding together. Also providing it as a platform with all my cloud providers so that all the startups, all the innovators, the next OpenAIs, and every enterprise can get access to the technology and capitalize on the opportunity, the revenue that the tokens bring to them and also the token-serving companies can make money on on top.
It is really important to look at the overall end-to-end value that the inference brings in terms of revenue to the cost of compute, which is actually going up in % or the benefit and revenue and benefit is going up in X factors. We are seeing that. Only by providing that kind of % to X factors do you get a growth trajectory that NVIDIA can hopefully provide and will continue to provide in the future. When we look at our value props, we look at our pricing, we look at our models, we are always looking at that net of through the chain, is everybody adding value? Is everybody able to capitalize it and able to continue to scale up and grow? If you just look at it over time, it is % to X factors to big X factors.
At GTC, you often see the big X factors kind of in there. There is that whole model that actually gets played out in that world.
Maybe one last one or two things. The new sovereign AI opportunity, how incremental is it? Is it just a lot of the Western companies just deciding to spend overseas, or is this truly incremental versus the original build-out of the internet was pretty kind of concentrated? Now, as we are starting to see all these new AI factories open up, is this truly incremental demand for this?
Yeah, it definitely is. When you go and talk to governments or nations and actually a lot of the supercomputing, my other job is HPC. I've been doing supercomputing for, it's where kind of this whole thing started from. Those same people are now in the center of attention in every country because computing is important for their nations. We just did, I believe it was 10,000 Blackwell GPU AI factory in Taiwan. It's for Taiwan industry. It's owned by Taiwan. It's there to help apply AI to manufacturing, whether it be silicon or automotive or city or civil or as a resource for the country. We're seeing in Japan, a country that is rich with data, with unique industries, with a unique population and demographics and a country that's facing significant change in how to grow. They're building AI. They're building their own.
They're using that data, building their own. They see AI as a national need, a computing need in order to basically apply their data, apply AI, apply computing to their industries. By the government stepping in, by the nation stepping in, they can actually consolidate that as a national resource versus waiting for every single company or every single industry necessarily to build their own. They can pool some of those resources. They're a good partner with NVIDIA. Seeing the same happening in Germany. It has happened already in the U.K. These are basically, and they know how to build them because most of those countries know why supercomputing is important and why supercomputing is important. Now it's really been elevated with AI to execute. The HPC and supercomputing side of the business has exploded as a result, and they know how to execute.
It is a really exciting opportunity. Every nation sees the opportunity to be a player on the stage and apply that. It starts with keeping their data, keeping their computing local, and also prioritizing it.
How large do you think it can be over time?
It's a good question. Today, we are seeing about 100 AI factories being built and assembled right now across the world.
AI factories, how much, like a billion-ish, or how much is an AI factory?
Stewart and the other teams can talk to it, but we track it as a data center build that is with either Blackwell or Hopper. It is specifically designed for serving and for tokens for industry. That is a number that is just going to continue to track and grow over time. Actually, next week is GTC Paris and also ISC, two events at the same time, International Supercomputing Conference. You will hear a lot about AI factories and sovereign AI and the activities.
The European Commission actually announced big projects earlier this year.
Europe gets it. They absolutely get the fact that they can and have the capability to deploy. U.S. as well. Last week launched at NERSC down over in Berkeley, across the bay, 9,000 Vera Rubins. Actually, our first supercomputer announcement with our next-generation Rubin architecture was announced with the Secretary of Energy. Actually, Jensen participated in the announcement. That will be deployed next year, 9,000 Vera Rubins. The mission at NERSC is open science and also for industry, and named after the supercomputer is actually named the Doudna supercomputer, named after Dr. Doudna, who invented, I guess, discovered CRISPR. She was there, a wonderful woman, brilliantly intelligent, and as an example of using and why computing is important for healthcare and pharma discovery.
This and one of the purposes of the supercomputer is to combine and figure out how to apply both traditional simulation and AI together to advance scientific discovery and needs for the nation.
Got it. Maybe one last question. What do you think will create a constraint on this growth? Is it access to power? Is it customers may not be able to adopt this kind of annual cadence of products? Is it just that CapEx demands are going up? What do you worry about the most as you look over the horizon?
There is a diversification that's happening. Of course, the business is expanding. The number of players in the data center world is expanding. Certainly, power. How many MW do you have and how many GW do you have? We track that very closely with all of our CSP partners, but also now increasingly with all of the NVIDIA Cloud partners and GPU data center partners. You've obviously heard of CoreWeave, but there's Lambda, there's Data, Nebius. There are many, many players now. The template of how to secure a data center, secure GPUs for that data center, and align with customers is actually very, and also, on top of that, the software and infrastructure necessary to operate and run a, it's not even just a cloud, just a GPU factory, an AI factory, a token factory, is starting to become fine-tuned and executable and operationalized.
There are multiple things that are coming together to help accelerate the growth. Certainly, the hyperscalers are not going to do it, and they're investing a ton. You can see there how much MW and how many data centers. Microsoft just talked about the fact that they this year are deploying more new capacity this year alone than all the capacity they had three years ago. There is an up and to the right curve in terms of and they shared what their next generation hundreds of thousands of Blackwell GPUs under one site that they're building. They talked about in their build keynote. Look at Scott Guthrie's keynote. It was great to see them talk about it. That is now but there's a diversification that's happening in terms of where can everybody get their compute.
Certainly, as more enterprises needed it, as more startups needed it, they're both going to the public clouds for sure, but they're also looking at all the regional clouds and what they can do from a data center capacity. The growth required is being tracked by GW of compute that's being put online, not just by CSP, but by the world, by all the players. The speed at which the deployment software and stacks get standardized or commoditized or understood or how fast they can deploy and then, as a result, diversified. Certainly, you hear about the big, big ones, obviously, but that is a portion of the business. There's a very long tail of and sizable part of the business that is distributed that's happening around the world, which is exciting because it's more people being able to contribute, deliver their compute, make it available.
I think the only other limiter right now is the speed of which people are coming up with new high-value models and bringing them to the enterprise. The enterprise, and that's all the Fortune 500s, their ability to take an AI model and have it add value to their business, whether it's straight up lifting ChatGPT and putting that into a help or the top of a search bar or to applying an ad revenue, to applying a better connecting feed to inserting the right ad or the right product placement to closing and making it profitable for them. That is certainly happening. That is where the speed of which that was the limiter there is just how many models, how many different techniques can be deployed in all those different use cases. It's also really hard to track.
I feel bad for you guys trying to figure that out. We get to, but if you see the activity around AI for enterprise, that is the demand generation that we're seeing across all of our consumption of our GPUs.
Got it. I know we are out of time. I did want to ask just one last question. What is NVIDIA's ability to monetize software? And where are you in that journey?
Sure. I'm going to pause on the public statements on software monetization because I don't have that off the top of my head. I don't want to say anything. I think we see some of the things we said in the past. We have sort of NVIDIA is an open company. My job is to make sure that their computing platform is available everywhere and to provide that compute, whether it be in the cloud directly, go all the way down to CUDA, all the way up to running PyTorch or running a model off Hugging Face. For the enterprises, there's companies want to work directly with NVIDIA. We have the opportunity to monetize working directly with NVIDIA on specific models and make it available. It's not to supplant the community, but to provide direct engagement.
That comes in the form of providing a supported Nemotron model, which is a model that NVIDIA generates. It's trained, actually, my team to provide that extra value directly to them. The other opportunity is in the data center software itself. A lot of our partners are looking for help to provide the infrastructure. We've talked about Lepton before, that software to support the clouds and to take it's one thing to stand up a data center full of GPUs. It's another thing to operate it as a data center and be able to serve and host and schedule and execute. That's another use case where we can provide that value. In general, our software stack, all of our library, all of our CUDA-X, and all of the inferencing software like Dynamo and everything else, customers want to be able to engage directly with NVIDIA.
We also offer that as an enterprise support so that they can have a direct relationship with NVIDIA. As our software footprint expands, as where they want to engage directly with us, we can directly monetize or provide a service to them, which they want to pay for. They want that engagement. Of course, as that value and as that goes to the broader enterprise, you'll continue to see that number increase.
I can go on for another hour, but we are out of time.
Sure.
Thank you so much, Ian. Really appreciate your insights. Thanks, everyone, for joining us.