Good morning, everyone. As we quiet the back room, I have a very important job. As a reminder, the content of this presentation may contain forward-looking statements, and investors are advised to read our reports filed with the SEC for information related to risks and uncertainties facing our business. With that, I will turn it over to Jensen and Colette.
All right. Good morning, everybody. I hope you enjoyed the presentation yesterday. Went a little bit longer, but I think it was an absolutely great summary for us. We're gonna take this time to focus on your needs and some of the additional kind of questions you are. We're gonna start with a couple, maybe the first slide or so, and then we'll open it up for questions, and I'm gonna turn this over to Jensen with that.
As I was saying yesterday, there were three inflection points in recent AI. The first one was generative AI, the second was reasoning, and we're at the third inflection point now, and each one builds on the others. There's a lot of technical reasons why each one of them built on the others, but here we are with the third inflection point, which is agentic systems. Agentic systems that are able to operate autonomously, that's why they call them agentic, because they have agency. You can give them goals, and instead of just answering questions, they can now perform tasks. Tasks could be anything. Of course, one of the most popular applications of agentic systems is write software.
You know, engineers in your company, I'm sure, and engineers in my company for sure are using agentic systems all day long. What used to be a thing for engineers is, you know, when you come to work, they give you a laptop. Now when you come to work, they give you a laptop and tokens. Token budget is now a real thing. Every engineer is gonna have a token budget. You know, the idea that you would hire a $300,000 engineer and they spend no tokens in doing their job, you gotta ask the question, what are they doing?
It is very, very clear now that every engineer will have a lot of tokens that they will have to consume, and those tokens are going to be produced. Now, I just said something a second ago. If you just connected the dots, we used to be, when an engineer comes to work, software programmer, somebody comes to work, give them a laptop. That's a tool. Today, we give them a laptop and tokens. Those tokens have to be manufactured. A computer used to be just a tool. A computer of the future is a manufacturing equipment. These computers, as you see, they're no different than ASML manufacturing equipment of the future. They're producing something that is sold. Just, it's no different than a dynamo machine a long time ago that produced electricity.
These are manufacturing systems, and the energy efficiency of it, the production efficiency of it matters everything because it drives your revenues, okay? The third inflection point is here. As you know, OpenClaw. Oh, many of these things when they first drop, these open source projects, they seem like toys. You take a step back and just analyze what is OpenClaw on first principles, and I explained that yesterday. OpenClaw on first principles is really a computer, the operating system of an AI computer, a personal AI computer. It has all of the properties of a computing system. It has all the properties of an operating system of this new computer. You know, it manages resources, it schedules, it does IO, and, you know, it networks.
All of the properties of a fundamental computer it has, okay? You could see the red line is actually not the y-axis. The red line is its growth. That's just the extraordinary thing. Every company in the world will now need to have it. What is your OpenClaw strategy? Every single software company, every single company needs to have an OpenClaw strategy. Just as we all had our Linux strategy, just as we all had to have an internet strategy, just what is your mobile cloud strategy? Now, the question is what's your OpenClaw strategy? Okay? This is a very big deal. I wanted to answer the questions about what I said here a little bit more.
First of all, a year ago, I said that we had strong visibility of our Blackwell and Rubin shipments of $500 billion through 2026. I was standing in 2025, right? GTC 2025 was around, what was it? Was it March? April?
October.
It was October?
October.
Okay. October. I was standing there. You sure? It was October?
I'm sure.
GTC DC or GTC. I said it twice, though. The first time I said it was GTC here, right?
I think you've been saying it twice.
Yeah.
I don't think all the way back.
I see. Yeah. Okay. Anyways. Anyhow, in 2025, one of those months, I said that we have strong visibility of Blackwell plus Rubin demand, purchase orders and demand, okay? Very firm demand of $500 billion. There were a lot of questions from many of you that, you know, so where are we now? You wanted an update on where we are now, and I thought I'd give you guys an update. Where we're standing right now and what month are we? Just for the record. March. Here we are in March. The end of 2027, as you know, is many more months away. I just wanna first let you guys know that.
However, because we're building infrastructure and factories and the lead times for everyone is long, they wanna make sure they give us firm demand or give us purchase orders and firm demand as early as they can to secure their supply. Okay? We have strong confidence and visibility of $1 trillion +. You know, it's not a floating point number, you guys. Okay? It is also not 94 digits of accuracy, okay? We're not counting cents. You can keep your cents. However, we have strong visibility of $1 trillion + of Blackwell plus Rubin. The reason why it's only Blackwell plus Rubin and not all of the other things that we sell is because I referenced it from the last year when I was only talking about Blackwell and Rubin. Does that make sense?
Last year we didn't have Groq. Last year, we weren't selling standalone CPUs. Last year, we didn't have many of the things that we have to sell now, and it wouldn't have made sense for me to include those today and not because we didn't have those things yesterday. Does that make sense? Somebody nod, then I can continue. Okay? A couple of things. It's only Blackwell and Rubin. It's not Feynman. It's not Rubin you know, Rubin Ultra. It's not Vera standalone. It's not Groq. Blackwell plus Rubin, we have high confidence, strong visibility, demand, forecast, purchase orders of $1 trillion+. We close businesses that we ship oftentimes, and we expect to close and ship more business between now and the end of 2027.
We expect to close, book, and ship more business on top of this between now and 2027. The reason for that is because we expect to be coming to work between now and the end of 2027. Now, unlike other businesses, because we build and complete systems of this quality, we can actually win, book, ship new business in the same quarter. Of course, you can't do that if you have to build an ASIC or you know, obviously. If you don't see it now, you're not shipping it by the end of 2027. That's not true for us. We build inventory. We have a pipeline of supply that.
We have to take care of customers who come out of the blue because they're desperate for more compute. Does that make sense? When they're desperate for more compute, and all of a sudden in the last day, they say, "You know, goodness gracious, I could use more," I would like to be able to say, and we are always in a position to say, "We'd be more than happy to help you." We're also working on new customers, new markets, new regions that we haven't put in here yet because we still have, well, about 21 months to go. Okay? I want you guys to understand what that $1 trillion is. It's, by definition, going to keep growing. By definition, because what I compared it against, it will keep growing, and it'll be larger than that.
A couple of things that I wanted to say, also, that last year was a really good year because 2025 was our year of inference, and I think we helped everybody understand that the price of the computer and the cost of the token, the price of the computer and the cost of the token are only marginally related. The price of the computer and the cost of the token. Remember, people are buying these computers to produce tokens. The effectiveness of the production of those tokens matter greatly. They're not reselling the computer. If you bought a computer and it's expensive, if you resold it, and that's it, then it's expensive. But you bought a computer, and it's expensive because the technology's incredible, but it produces tokens at such incredible rates.
You had simultaneously have purchased the most expensive computer and produced the lowest cost tokens. Does that make sense? This is what we do every day. This is our job. It is the reason why we deliver the value that we deliver. The value discrepancy that we deliver here, the two numbers that I just described, is how we're able to secure our gross margins. We have to deliver, and we consistently deliver so much more value, which is tokens per second, which is tokens per second per watt. We deliver so much more value every single generation that customers would prefer to buy our next generation product at a higher price than our current generation product at a lower price. They prefer instantaneously to convert. The moment that Vera Rubin comes, it is smarter to install Vera Rubins than to continue to buy Grace Blackwells.
Are you guys following me? Somebody nod. Okay. The value is better even though the price is higher. I'm comparing these two systems because these are the two de facto systems in the world. Until you can beat these two systems, there's no point buying something else. These two systems are incredibly hard to beat because Moore's Law doesn't give you 35x. Moore's Law alone won't do it. Building a faster chip won't do it. You're gonna have to build a faster lots of chips. 2025 was year in inference, and I think we demonstrated our inference leadership. Training to post-training to now inference. Then some of the other things that we did last year that was really great is we expanded the reach.
We expanded the number of AIs that now support our platform. Last year, 2025, we added Anthropic to our platform, which is net new. We added Meta AI, which is net new. We're still working with Meta on all of the other stuff. Meta AI is a net new entity, and they have net new computing requirements. We can all acknowledge that last year, open source software, open source models really took off to the point where API inference service providers now see that open models probably represent, approximately represent the second most popular AI model. Meaning the first one, of course, is OpenAI into total number of tokens generated. In aggregate, open models represent number two. As you know, NVIDIA is the best platform for open models in the world. We are the standard for open models everywhere. Number one, OpenAI.
Number two, all the open models. Number three, Anthropic. Number four, xAI. Just take your list, keep working. I think NVIDIA's coverage of models last year increased substantially, which explains our accelerating growth at a very large number. We are already a very large company, as you know, and we're now accelerating. Our rate of growth is actually accelerating. Anyways, that's I think about it. Oh, one last point. We love our hyperscaler partners, and we work very, very closely with them, but it's important to understand that our relationship with hyperscalers is we're not just selling to them. We attract customers for them. Having CUDA in their cloud brings all of the CUDA developers, all the AI natives, all the large companies that we work with.
Whenever we accelerate those large or small companies, we bring them, we train them, we have them hosted in the world's CSPs. We are one of the best sales forces of the world's CSPs. It is the reason why if you go down to the show floor, they have all of the largest booths. AWS has the largest booth here. Google Cloud has the largest booth here. Azure has the largest booth here. Oracle, giant booth here. CoreWeave, big booth here. Does that make sense? Because we bring customers to them. Why are they here? To sell to my developers. All of our developers only know how to program one thing. They only know how to program CUDA, and they only use CUDA-X libraries.
when we win and when we help those developers integrate NVIDIA, they land on one of our CSP partners. We are one of the CSP's best sales forces. All right. However, we're also seeing tremendous customer diversity outside of the CSPs. Regional clouds, industrial, enterprise on-prem, when Dell and Lenovo and HP, they're all growing so fast and all the ODMs are growing so fast. A lot of that business go towards the right-hand side of that chart, the 40%. Most people see our business in the left 60%. The right 40% without NVIDIA's full stack, the fact that we can build you the entire AI factory and the fact that all of the world's open platforms run on top of NVIDIA, you have no hope addressing the 40%.
The net of this chart is this, a big part of that 60% is NVIDIA developers landing in the cloud. 100% of the 40% is impossible without full stack, without end-to-end. Was I successful in communicating that? It's important to understand our business. We aggregate that whole thing into what is called accelerated computing, and it's probably a disservice to you. Next year, we're gonna separate it out a little differently. Well, in the future, we're gonna separate it out a little differently, and it's gonna look probably like this chart. You'll see something like hyperscalers or something like that in 60% of it. Even when you see that, remember, a lot of those customers, we brought to the clouds.
On the right-hand side, that 40% is completely impossible if you just build a chip because they don't buy chips. They buy platforms. Three messages all in one slide, which probably made your brain blew up, and therefore, I did it again. Was that helpful? You know what I should have done? I should have made three panels or three slides. It would've been a seven-hour keynote. But it would've been worth it. Okay. That's it. Thank you. Questions?
We're opening it up for questions now.
Hi, it's Ben Reitzes, Melius Research. Thanks for having us here at this event. It's amazing access that you guys provide. Congrats to you and the team for that. This is great. Jensen, last night, when we took a picture. By the way, you all can still like that picture. I need to beat last year's record.
What, what picture?
Well, we took a quick picture, and I posted it. I 'm trying to beat last year's likes.
Oh, okay. All right.
Yeah. Anyway.
Was I in some vulnerable position or anything?
Let's put it this way. The camera added 10 lbs to me, but not to you. I don't know how that works.
Thank you.
You look great. So, I promised I'd ask you an inference question, and this is related, is, you know. This is great. Like, I think a lot of people here get this. I think the main pushback we get is the juice worth the squeeze, and will the hyperscalers have upside to their revenues for API and cloud that justify all the spend, and what is Jensen seeing? Because, you know, I have estimates for the hyperscalers, and I've said there's upside to the revenues. For now, the CapEx is 20% above their cloud/API revenue, and I'm wondering what you're seeing. You've said in the past that there's this massive upside to these cash flows and from your customers, particularly hyperscalers and those that are serving Anthropic and OpenAI. When do we adjust those higher?
I know this is a tough question for you because you got to guide for five other companies. If we see that upside, I think your stock will behave a lot better because then we'll realize this build can keep going. When is this inflection? I mean, we're seeing the inflection, but when is it, you know, what is the upside to their revenues, and how do we feel better about it?
I wish those companies were public, and the reason for that is because then you'll see what I see. No companies in history has ever grown as a startup company, non-public company. As a startup company, increased revenues by $1 billion or $2 billion a week. That's what they're experiencing right now. Now, remember, I just said a week. The entire IT software industry is, call it $2 trillion. That $2 trillion industry, I don't believe is gonna be disrupted. I think it's going to be transformed.
I believe that $2 trillion IT industry is going to integrate a combination of OpenAI, Anthropic, and open models and connect it with an open source software called OpenClaw that we turned into an enterprise-ready version called NemoClaw, and you have instantly an agent. 1.5 million people downloaded OpenClaw and built themselves an agent. It's one line of code. Then you tell the agent to finish building itself. Oh, you don't know this thing? Go learn it. It goes off and learn it, you know. In the future, those agents will be integrated into the IT industry. This IT industry is $2 trillion of software licenses today. It's probably going to be, let me just pick a random number, $8 trillion that also resells an enormous amount of tokens. 100% of the world's IT industry will become resellers of OpenAI and Anthropic. Are you guys following me? No?
Take your estimates up for OpenAI and Anthropic.
I believe that Anthropic and OpenAI and of course all of the IT company will also modify and customize their own software, their own models with open models, and that's what Nemotron's for, and that's what Nemo's for and we've created all the tools, and that's why we're working with all of them. They're all going to create agents that integrate these three components. I believe they're gonna grow incredibly. The time is gonna come, it's gonna come soon. The reason for that is you could see it in Anthropic's numbers, you could see it in OpenAI's numbers. They're growing an entire IT company in a month. The revenues of these AI companies, their AI will be used by enterprise directly, but it's also going to be resold through IT companies, integrated into IT companies. Does that make sense?
Yep.
Because just think of it, that AI is just software. Their software is gonna be offered directly to enterprises, but it's also going to be integrated and become domain-specific and specialized, governed, secured, easily provisioned, connected to their system of records, so on and so forth. There's gonna be a whole, you know, and that agentic system will be rented to customers, but they still would have to consume tokens through factories. If it comes down through OpenAI, that's terrific. It comes down through Anthropic, that's terrific. If it comes down through open models, that's terrific. They all have to have tokens generated. The net-net is IT companies of the past license software. IT companies of the future will rent tokens, will generate tokens. Are you guys following me? Their business models will change. The companies will become bigger. Their gross margins will change.
Gross margin profile will change because they have COGS in their business model now. They offer greater, much, much more value. This is exciting for them. Super exciting for them.
Okay, great. Passing this $8 trillion microphone.
Thank you. Good morning. CJ Muse, Cantor Fitzgerald. Thank you for hosting this event. Really appreciate it.
Yes, CJ.
Wanted to, I guess, maybe follow up on Ben's question, and think about the evolution of this chart of 60/40. You know, you talked about NemoClaw, and then you announced yesterday the Vera Rubin DGX AI factory reference design, essentially, providing the blueprint for your non-hyperscale customers to compete with the hyperscalers. I'm curious, you know, as you put it all together, you see, you know, massive spike in token generation, how you're expecting, you know, pretty much this chart to evolve over time, and how we should be thinking about the different players inside there, as to their relative kinda growth vectors.
I think that this chart grows on both sides of it, grows at similar rates approximately until the physical AI inflection happens in a few years. Let's say physical AI inflection happens, then the industrial side has to be done on-prem, it has to be done at the edge, it has to be done in location, it has to be done in the factory. All of a sudden that 40% is likely to grow, and I think ultimately that 40% becomes larger. The reason for that is because the world's industries that are related to physical AI is much, much larger than the industries related to digital AI. You know, something like $70 trillion of the world's industries, you know, 50, 60, 70, requires physical AI.
Because the world is happening not in, not in our laptop, the world happens out where the world is. There's a lot of, you know, atom-related businesses that simply can't be taken care of without physical AI. I believe and I hope that 40% actually becomes 70%. Both of them are gonna be incredibly large because the world is gonna produce tokens every single day continuously. It will not stop. You know, right now as we speak, all of our laptops, you know, well, hopefully most of your laptops are kinda sitting idle. In the future, the computer's gonna be running 24/7 creating tokens because your agents are off doing work.
Somebody, you know, I was reading one of the Reddit posts. Somebody's Claw consumed 50 million tokens in a day. Now, that sounds like a lot, but that's only $50. If you had an agent doing productive work, $50, that's not bad. You know, you could have somebody who makes a few thousand dollars a day have a whole bunch of agents spending $50 a day becoming a lot more productive. This is going to be the norm. I have them at NVIDIA right now as we speak, and I'm hoping the person that I'm paying a couple thousand dollars a day to is spending more than $50 a day of tokens. You know, are you nuts? I want you to be managing an entire fleet of agents doing your work.
You know, I'm really hoping that somebody who makes $2,000 a day is spending $1,000 a day of tokens. What I just said makes sense. It's gonna happen, and it's already happening in software companies all over the world.
Hi, guys. Stacy Rasgon from Bernstein. Thanks for taking my question. I have a quick clarification to ask Colette, and then, Jensen, I have a question for you. Colette, just to clarify, I know you've talked about Rubin ramping in the second half. Groq sounds like it's launching in Q3. So am I correct in thinking that Rubin should launch with Groq? Because I don't think Groq goes standalone. Then, Jensen, I want to ask a longer-term question for you. You know, I really like the chart you put up the other day, where it showed like sort of the extension of the spectrum of inference, which, I mean, drove value from Groq. You used to talk about how GPUs were the only way to go.
We now see architectures like Groq are needed to sort of take advantage as that spectrum of inference widens, low latency becomes more important. I guess I wanted to get from you, how do you see that spectrum evolving from here? Does your platform now have all the pieces that you need, as we go forward over the next, like, several years, and you know, hopefully longer than that? What are the new types of workloads with inference that you see coming? And do you have all the pieces you need to take advantage of that, or is that something else, that we still need to be keeping our eyes on as that grows?
Okay.
First, Stacy, thanks for the question regarding Groq and the LPU. We did communicate that would be also in the second half of this year, starting, and we'll see where that looks once we get closer to the second half of the year. It is in this current year.
You did say Groq shipping in Q3, I think yesterday. Yeah?
Correct.
Okay.
That's what we're expecting. However, Vera Rubin is gonna ship before Groq.
Will ship before? Okay.
Yeah, yeah. The reason for that is because we're already in production on Vera Rubin. Systems are already going through lines and, you know. At the moment, that's the condition, right? It's okay. It's just fine. Vera Rubin is extremely hard to beat, even for Groq. Even adding Groq to Vera Rubin is very tough to beat Vera Rubin. I'm gonna explain your question in a second.
Okay.
It turns out in computing, you have two types of architectures, one that's extremely high throughput, one that's extremely low latency. It's not completely true, but it's close to true. In fact, a CPU is a low latency computer. Notice the size of the cache on board, the SRAM. Groq is an extreme version of that, hyper extreme version of that, where the SRAM occupies basically nearly the whole chip, and the scheduling is done completely statically. Meaning the compiler figures out where the data and where the compute is, and it makes them meet just in time. The whole Groq system is like one giant synchronous machine. As a result, it is deterministic, it's extremely low latency. It is not easy to program. It is not flexible.
It's not general purpose, but it is what it is. What we've done is we've taken Vera Rubin, which occupies, yesterday I described about three quarters of that space. Vera Rubin is the right answer. We don't know how to make that better. If we knew how to make that better, we would've made that better. NVLink 72 and the Vera Rubin Ultra NVLink 144, and Kyber NVLink 1152 is gonna keep expanding the aperture of that left-hand side, where high throughput matters tremendously. We're gonna add Groq, fuse it with Vera Rubin, fuse it with our GPUs, and use Groq to process the very last stage of autoregressive models, which is used for language models. That last stage is extremely bandwidth intensive. If we ganged up a whole bunch of SRAMs, like thousands of Groq chips, okay? It's eight to one.
For that last 25% of the power and, you know, that last 25% of the use case, because your data center has all kinds of different use cases, it's not just one, right? We're all using ChatGPT. We're all using it in different ways. We all have different tiers of pricing, and so we're in different bands in my graph. We're in different bands in that graph. Are you guys following me, Stacy? I showed the zero tier, the free tier, you know, good, better, best, extreme version. So for free, good, better, Vera Rubin is untouchable. We can't think of anything close by.
For you know, best and extreme, probably you know, the best and extreme, adding Groq to that, you could increase your throughput on the best, and you could extend the extreme version even further. Now, the extreme version is now introduced a new tier. Your volume, because the throughput curve, your volume is so low, you can't afford to make that demand too high. You have to set the price quite high. Does that make sense? However, there's a new class of customers who's very, very rich software engineers. They already cost so much money that if I added to them $100 a day of inference cost, token cost, I'd be more than happy to do it. If I added even $1,000 on crunch time, more than happy to do it. Does that make sense?
I'm simply describing what's happening to a market that is, if you will, maturing. In the beginning of the market, nobody knew exactly. The technology wasn't mature, and people didn't know exactly how to use it. 100% of the early inference customers were free tier. As the technology started to reach o1 and o3, all of a sudden, the paid tier skyrocketed because people are now able to use it for something useful. All of a sudden, when agents came, now, for example, Claude Code, right? Codex, those tokens are a lot more expensive than free tier, and they're a lot more expensive than $20 a month. That segment, we just added two more segments. Did you see that?
This is no different than iPhone in the beginning, there was only one version, and now there are a whole lot of versions. No different than the car industry, no different than any industry. As the market expands, the segments expand. I showed a factory that is able to produce tokens of different segments and different tiers. From very, very smart, incredibly fast, to high throughput, free tier. I described an architecture of AI factory architecture that allows you to address the whole thing to maximize ultimately the total revenues of the factory. We let you decide how you wanna mix and match. My estimate is it's probably about 25% today for, call it, a handful of companies.
You have to be one of the you need to generate a lot of tokens to make it worthwhile. There's a whole bunch of they call them inference service providers, ISPs, API service providers. I think they could also benefit from this, okay? Because they would like to have different segmentation of token generation. I'd call it a group of 10 customers, and 25% of that 10 customers represents a big part of that pie. We can increase our total revenues with Groq by 2x on 25%. 2x by 25%. Does that make sense? Say 25%.
I mean, as you continue, like, with new versions of Groq, with new generations, so what does that do? Are you pushing that out even farther, or are you lowering the cost and increasing the demand? Like, I'm just trying to get some feeling.
We're always doing one of two things. We're pushing the throughput at every tier up, and we're always pushing the smartness of the AI out. You see the Pareto? I'm always pushing it up. I actually did the transition showing you guys from Hopper to Blackwell to Vera Rubin. I'm always pushing it up, and I'm always pushing it out. Whenever I push up, the production volume of your factory goes up at every price point. At the same price point, the volume goes up, okay? When I push it out, you can introduce new tiers of AI, new tiers of tokens.
Mm-hmm.
Therefore, you got new price point today, you know. Price point of, call it $6 per million tokens. That's kinda where the world is. We really like to be, I know they would all love to be, $50 per million tokens, but super large models, super fast. Could you imagine a 10 trillion parameter model running at 500 tokens per second? Our engineers will pay big money for that, and I would let my engineers pay big money for that. So that world wants to come. Then the next tier will come again because the models will get bigger, they'll think more, they'll use more tools and things like that. You know, it's just like back in the old days.
For I don't know how many of you knew NVIDIA in the beginning, but we had one product, RIVA 128. $299. That was it. One product. You know? Those good old days. Today, we have 5090, 5080, two different SKUs. 5070, three different SKUs. Are you guys following me? All of these SKUs exist because the market got larger, and it started the segment, and people wanted different things. The market is exactly doing the same thing with tokens. It's getting larger and larger, and different segments wanted different things. We need to help the customers, we need to help our model makers produce, manufacture different segments of tokens. I know they look like numbers, but, you know, they're different AIs. Make sense?
Got it. It does. Thank you.
Yeah. Incredible. We're gonna increase the throughput, and we're gonna increase their pricing simultaneously. That's the benefit of Vera Rubin, and we did that every single time. We did that with Blackwell. We did that with Vera Rubin. We're gonna do that with Vera Rubin with Groq. We're gonna do that with Vera Rubin Ultra. We're just gonna keep pushing that envelope. Ultimately, the simplistic way is that Pareto chart, because the factory got a lot of different workloads and different customers. That Pareto chart, we wanna push the Pareto frontier out. Up and out. Constantly up and out. Constantly up and out. The computer science necessary to do that, insane. The hardest problem of all.
Thank you.
Thank you.
Hi. Vivek Arya from Bank of America Securities. Thanks, Jensen. Thanks, Colette, for hosting us and for a very informative event. I wanted to ask actually two related questions. One is, in this $1 trillion, Jensen, that you showed, you have other products also that you spoke about yesterday, right? The Vera CPU, right? Other CPUs, you know, you have Groq. You know, you have a storage solution, right, CPO, right, as assumed. How much of that is incremental, right? Is it a small number? Is it a medium? Like, how much more is that addressable market that is not captured in this trillion, assuming it is incremental to this? Then I wanted to double-click on Groq again, Jensen. I think you mentioned that it will take up 25% of the inference. That's a pretty big statement.
Is it cannibalizing something? Is it you know, what is kinda the value capture from Groq over time? A lot of people ask us, you know, is it cannibalistic of high bandwidth memory demand? I don't think it is, but I would love to hear your view on how to kind of put Groq in the value capture right part of the spectrum.
Okay.
Thank you.
We're the only company in the world today that can optimize an architecture, one AI factory across three memories. Of course, HBM memory, but we're the first to use LPDDR5, which is extremely high bandwidth and very low power. That changes the equation for CPUs. The third is SRAM. We can now utilize all three memory types to create the perfect architecture, and we are. Okay, that's number one. We used to offer just NVL72 Grace Blackwell. That was our rack. We had one rack. We now have five racks, as you know. The reason why is because. Can you go to the next slide? Thank you. Oops, that was previous.
Still one.
Yeah.
This one?
No.
Back?
Back. There you go. Is that the one? Yeah. You see that? This is what NVL72 did. It ran that. Are you guys following me? It ran all these large language models. This is what it was designed to do. All of our inference stack ran that. Remember what an agentic system is. It runs this. This is what Claude Code now do. This is what Codex now do. It runs all of this. It has memory. That goes into the KV cache. It has. That's on the MGX system. This memory has grown so much that it needs to be accelerated. It's just too much. All of our working memories, every time we use it, the more we use it, the harder the problem we solve. This is structured and unstructured data.
This is where I started the keynote with cuDF and cuVS. The stuff that nobody ever talks about, which is incredible value in the future because this agent is way faster than a human, and it's gonna bang on that way harder and faster. Does that make sense? Then tool use. Web browser. A web browser runs on a CPU. You need a CPU to give the agent access to tools. Then it spawns off sub-agents, and who knows what this could be. One of the sub-agents could be cuOpt, which is GPU accelerated. Another sub-agent could be Omniverse, GPU accelerated. We need those kind of GPUs in the data center. The way to think about what is Vera Rubin.
Vera Rubin as a system expanded tremendously because we went from processing that, which is still 90% of the workload, to processing all of this. Are you guys following me? This is AI. This is where ChatGPT started, but this is where it is now. Can someone nod?
Yes.
You guys get it? Okay. Give me a thumbs up. All right. Thank you. Because I'll do it again. This is why. You know, sometimes our keynotes run long because I look in the audience, and there's some person sitting in front of me just like, they look lost. I just. I'm gonna have to do this again. I'll leave nobody behind. You know? This is an agent. What just happened? In our data center, that data center doesn't wanna be cobbled up Frankenstein. It wants to use elegant power delivery and cooling systems. We took all of the computers that's here, and we put them into the MGX rack.
We designed the world's perfect processor for each one of these things and just rack them up. Does that make sense? If you're gonna put storage, which is right up there and here, if you're gonna put that into east-west, which is in the same aisle as the compute, you better make it so it's not a Frankenstein outfit. You can't have liquid cooled in NVLink 72 racks and then air-cooled, you know? You can't have 300 kW here and then use 50 kW here. It makes no sense. We took the whole thing and harmonized all of it in one single rack architecture. If you wanna build a cluster to run that, you just connect them all up. It's incredible.
Same power delivery, same cooling system, all 100% liquid-cooled, all completely optimized for the workload, all fully accelerated. Now your question, in order to run this agent and be able to offer all the things that we were just talking to Stacy about, you would increase your CapEx. You would increase your compute spend, the GPU compute spend, by 25%. You add Groq to 25% of this of the workload, and you buy 8x as many chips, which is approximately the same price as the NVLink 72 racks, okay? 25% is multiplied by two, and that's the same as 25%, okay? Your compute spend goes up by 25%. That's the first one.
That's not in the $1 trillion. If 100% of that $1 trillion now adds Groq, then it'll be $1.25 trillion, okay? We also have storage, which is a lot. Because storage, as you know, there's just a lot of storage in the world. It is the second largest compute spend. The third will be CPUs for tool use. I'm not expecting CPUs to be that much and call it, you know, because just CPUs don't add up to much, okay? You could say CPU is another 5%, okay?
I, you know, if you were to say, all in, the difference between Grace Blackwell racks, which as you know, saw was however big it was, and the Vera Rubin racks, okay? If it added another, you know, 50% opportunity, I think that's probably not far off. Did I just kind of reason through it for you? Is that? Everybody got that? Okay. That's the fundamental difference between the Grace Blackwell go-to-market and the Vera Rubin go-to-market. Because what we're solving in the Grace Blackwell world, inference. We want it to be inference king, you know? Who doesn't, right? That's what we're solving. Vera Rubin, we're solving for this. That's why, that's why I said OpenCL is completely transformational. Finally, we have one piece of software that runs across this whole thing.
One open source software, it is the operating system of this chart. It's incredible. Now every company in the world can go build this.
Answer this question again, right?
Joe Moore from Morgan Stanley. You're generating $1 billion every couple of days, which seems pretty good. Can you talk about the uses of that cash to build strategic advantage in your business? You're making investments in ecosystem partners, you've got purchase commitments on components, you're also returning cash to shareholders. How do you balance those priorities?
Well, the priorities have to go, number one, it has to fund our growth. Our supply chain we work very closely with, and we're in a great place with our supply chain today for a good reason. It's because we work very long-term with them. We help them plan their business. We award businesses to 'em to support their growth. We even prepay and sometimes we'll even fund their capacity with them growth. We're preparing for $1 trillion, you know, over the next, you know. I just have to be very clear a trillion plus through December 25, 2025. I think we probably shut it down at 4:00 P.M. through that time, Pacific Standard Time. There's a lot of caveats in there, just make sure.
Anyways, the plus. That's number one. Number two, we invest in our ecosystem because as you know, the CUDA developers and the growth of this AI natives in this stage is really important. After that, we're still gonna generate quite amount of free cash flow. Well, I'll let Colette answer it. I mean, she. We have a good plan, so go ahead.
Yeah. With the strong growth that we have of the $1 trillion going forward, that gives us, of course, a very good position in terms of free cash flows. He talked about some of them upfront in terms of making sure that our suppliers and everything that we need to do is built in order, and that may take some prepays. The second thing is our investments. We are still working in terms of with our commitments that we made over the last year that we need to do in the first half of this year. Once we move forward and complete those, we do have an opportunity for stock repurchases and focusing on returning capital to our shareholders. It is still a very important part of our work that we are gonna do.
We had a good year last year, and I think we're gonna have another great year in terms of what we can do in terms of returning capital to them.
Oh, I thought you weren't gonna tell him.
Do you wanna give him a certain amount?
It's up to you.
Okay. Where we stand right now, it is probably not taking into account the plus sign, we will probably be at 50% stock repurchases and dividend together as a percentage of our free cash flow. That's where we're starting out, and as you can see, the plus sign is real, and then that does give us an additional opportunity to even do more. The timing of it, again, remember looking through what we have to do here in the first half of the year with some of our existing commitments, but stay tuned.
Hey, it's Timothy Arcuri at UBS. Thanks. Let me preface this by saying that this is not what I think, but this is what I hear from a lot of, you know, folks out there. There's some concern that you're capturing too much of the value of the ecosystem and that you can't sustain these margins over time. How do you respond to those concerns? I know you see stuff online about having to invest in the ecosystem and people sort of spin that in a negative way. Can you just talk about how you can sustain your margins?
First of all, almost everything I told you guys yesterday is a new perspective. It is not illogical that everybody has to understand tokenomics. It is not illogical that the world needs to learn what a computer has become. If we continue to deliver X factors of tokens per second per watt every year, if we continue to deliver X factors of ASP increase for them because we introduce new token segments, customers will be more than delighted to continue to do work with us. It is also true, and I've said it before, and the math is absolutely clear. Every CEO of every cloud service provider, I would challenge them all to go and create that chart for themselves, and I'll help them. You pick your favorite other configuration.
Third-party chips, build your own chips, and you put it into that model faithfully, and then you can decide, would you like to have higher revenues or lower? Would you like to have higher ASPs or lower? Would you like higher margins or lower? Because that's all it means. Look, TSMC's wafers are the highest in the world, but they're the best value in the world, and I gladly pay for it. ASML systems are the most expensive in the world. They're worth it. There's no question about it. The question is simply, do you want to make more money or do you wanna buy the lowest cost equipment? Do you wanna make more money or do you wanna buy the lowest cost equipment? That's the difference.
Now, what I just said is a new concept, and I think we can all acknowledge that. I just treated a computer system the way I treat TSMC's chip factory, the way I treat ASML manufacturing equipment, and that's not the way people thought about it in the past. If I have two CPUs, one of them is 256 cores, the other one's 256 cores. Tell me which one is the better one. Well, the cheaper one's the better one because I'm renting it by the core anyways. But that's not the way tokens are created. You don't rent by the core. You monetize by the tokens per second. And so it's a different economy. Does that make sense? You're not renting cores, you're not renting nodes, you're producing tokens, which is the reason why everything changed.
It was necessary, you know, to make sure that everybody understands the economics of the new world. Anybody who says that simply does not understand the business. That's all. They're trying to buy the lowest equipment, lowest cost equipment. My equipment costs 30% cheaper. What does that mean to your factory? What does that mean to your factory? That's really the question, you know? I think people. Anybody who says my chips are 50% cheaper, put that in the context of the factory, and that person is actually demonstrating to you they don't understand AI. They're just saying. Somebody goes, "I'm 30% cheaper." You don't understand AI. "I'm 40% cheaper." You don't understand AI. "My chips are cheaper." You don't understand AI. You guys all know who I'm talking about. I'm not talking about anybody. I was just saying. It's a theoretical comment.
Hi, Joshua Buchalter from TD Cowen. Thank you for spending the morning with us, and I know there's a lot of customers and partners that are after your time, so we appreciate it. I wanted to ask a question. You know, you said a few times, I think yesterday, that you expect to be short capacity into 2027. Can you elaborate on where you're seeing those shortages? You know, on that note, you know, you've described yourself as the chief revenue destroyer and Satya's made some comments about not wanting to over-index to one generation knowing that there's another one coming very soon. Is that behavior unique to Microsoft and are these constraints sort of protecting you?
By the way, Satya would also tell you who told him that. Exactly. I told Satya, "Buy what you need this year, because next year there'll be something better.
I guess my question on that is TSMC constraints or the capacity constraints sort of protecting your other customers from doing that? Or do you see them holding a similar mindset as Satya?
No, I think that. You know, I don't want you guys to thinly slice and dice our choice of words. Is the world supply constrained? At some level, yes, right? Can we all agree? Saying the opposite is weird. You know? Hi. Is the world constrained on cars? Well, what if I tripled the demand? Yeah. Everything is somewhat constrained. It just depends on everything. Because we're building at such a large scale, our life is just not simplistic. It's not so simplistic as to say, "Oh, if I just solve this one problem, that's it. Life is good." We are working multiple dimensions across multiple suppliers and making sure that things are in harmony. You don't have too much. We don't have too little.
We can meet our demand, plus. The reason why we wanna meet our demand plus is because there's always new demand coming for the next 21 months. I got a whole bunch of new demand that's coming, and so I gotta prepare for that. There are all kinds of parameters and not simple. If I told you that we are supply constrained on this one item, then I know what you guys are gonna do. You know? I think the system is harmonious. Nothing is too much. Nothing's too little. We don't have too much power. We don't have too little power. We don't have too many construction workers. We don't have too many plumbers. We don't have too few plumbers. You know, we don't have enough.
We don't have too many cables. We don't have too many optics. We don't have too few optics. Are you guys following me? It's just kinda right there. We're working every day. The $1 trillion+ we can meet.
Perfect. Aaron Rakers with Wells Fargo. Thanks for doing this as well.
Thank you.
I'm surprised we got to this point without this question being asked, you know, and it's more technical. You know, there's a lot of discussions.
You know what? We're kinda like the Fed now. Did he say near or almost? What did he mean by? Well, we gotta do all of his previous transcripts. When did he use that word and what, you know? Here's what I know. Demand is accelerating at a very large scale. We'll be able to support the supply.
Perfect.
Yes.
Perfect. I was gonna ask about architecture.
Oh, phew.
I've gotten a lot of questions about yesterday's presentation, where CPO starts, where copper ends.
Oh, dear.
You outlined NVL576.
Yeah.
There was an NVL1152 on the slide. I'm curious of what is your current thought process around o ffering both.
Yeah.
How does that evolve as we scale to Rubin Ultra, Feynman? Just curious of your thoughts. Thank you.
Okay. Please treat my partners properly. They're all doing great. Okay? I'm not saying anything here that suggests any of their businesses. I'm gonna go the other way. All of their businesses are gonna grow because of us. We're gonna grow copper. We're gonna grow optics tremendously. Now, did I say something that is completely logical? The answer is yes. Let me tell you why. We should scale with copper as far as we can, as long as we can. But you know, at a meter plus or minus, it's kind of, you know, the limits of copper. Okay? You've seen us go from NVL72 to now Rubin Ultra NVL144, right? Where the back plane was designed to be able to support that. Okay?
That's kinda approximately, you know. We're gonna keep working on our series, and if we could extend it from 144 to 288, we'll be more than happy to do so, because you should use copper for as long as you can, because copper is just easier to manufacture. It's more reliable. We've been manufacturing it for a long time. Humanity's been using it for a long time. Did I say anything that's illogical to anybody? Everybody makes sense. You should breathe air for as long as you can until you're out of it. You know? After that, we'll breathe like compressed liquid air, you know. But until then, how about air? Okay? It's free. We've been using it for a long time. It's safe, all right.
We should scale up with copper as long as we can. As you know, we also took Ethernet to a structured cable back plane. That's incremental growth opportunity. Isn't that right? I just said it yesterday. We're gonna take the back plane of Ethernet, and we turned it into these spines, because these structured cables are really easy. You know, now that we mastered how to use it and manufacture it, and it's a real artistry, we now can create these things and you could. It's easy to maintain, it's easy to ship, easy to wire it up. You know, you make no mistakes, right? It's fantastic. However, simultaneously, we wanna scale up beyond 72 to 144, right? To 1,152, and maybe even further than that someday.
There's a limit to how far copper can go. You could see we're 100% copper now. The next generation Ultra will have two options. You could copper or copper plus CPO. Because I have two options: copper plus CPO or copper. Okay? That's one year from now. Two years from now, at NVL1152, it's all CPO. There's a limit to how far I could take copper, so there's a transition. However, even when NVLink is CPO and Spectrum-X is CPO, we will still have copper for the Ethernet scale up on our racks. We will still have copper for our storage. Does that make sense? Because we have five different racks.
The amount of copper we will use will continue to be high because even though scale up will go to CPO in two, three years, the total consumption of copper connectors is gonna continue to grow because our demand and our total capacity continues to grow with all these different other racks. Was I clear? Okay. Thank you. Gotta select the words just perfectly.
Toshiya Hari at Goldman Sachs. Thanks for taking the question. You know, you previously talked about the spectrum of token costs and the, you know, very helpful to hear the 25% of that in the high tier. How do you see the market evolving over time in terms of growth rates of the low or free tier versus the high tier and, you know, in a market that's been sort of predicated by big decreases in token costs coming down over time, how do you see that trending? Does that start to slow or potentially flatten out and why?
Token cost is gonna keep on coming down. Can we go to the next slide, Colette? Like, token cost is gonna keep on coming down, you know, every single year. This is just Grace Blackwell, and then Rubin token costs will come down again, and Rubin Ultra token costs will come down again. Okay? Meanwhile, the token smartness, the smartness per token is gonna keep on going up as well as we extend that curve to the right, okay? The x-axis. Meanwhile, we're gonna increase the throughput. This is everything that has to be. Nobody cares about tokens per second. You always have to divide it by watt. The reason for that is because your data center is only so big. Your data center is of a gigawatt, you're not gonna have two.
If it's 200 MW, you're not gonna have three. Does it make sense? You always have to normalize it, otherwise, no architecture, you can compare nothing. Moore's Law was always divided by something, okay? You have to take tokens per second per watt. Anybody who shows you anything else just doesn't understand AI, okay? They're trying to deceive you somehow. All right, that's the reason why SemiAnalysis did it right. They did it right. Everything was divided by one, okay? We're gonna keep on increasing throughput. This is the price of a token, whatever the price, whatever that ASP is, we increase its throughput. Whatever the ASP is, we increase the throughput. Does that make sense? Here, whatever that segment is, we reduce the cost.
Whatever that segment is, we reduce the cost. This is kinda like, this thing on here is essentially your segment, product segment, and that's throughput, the volume production, and that's the cost of it. These are the two curves. That's why these two curves are so important. Now, I combined those two curves. You can combine those two curves if you like, but it makes your head blow up. This curve is essentially the Pareto. In fact, most of the world today is simply right here. This is the Hopper world. You see that? Hopper is kinda right here. Blackwell extended it and added a couple of segments. This is really valuable, and people love that because the ASP difference between here and here could be 5x, 10x. Make sense?
Larger model and faster. Okay? These are really valuable. Now, how do I see the curve changing, demand curve changing? Yesterday I used 25% here, 25% here, 25% here, and 25%. That's all I did. A supplier's, a manufacturer's distribution of different product segments just kinda depends. Do you guys see what I'm saying? It kinda depends. You know, Ferrari is kind of all high-end, nothing in the free tier, you know. Then somebody else, right? Just depends on the brand. I think it's gonna be the same here, guys. If your business is search, you're gonna be largely free tier. Because nobody pays for search. If you're a search business, you're gonna be largely free tier. If you're code generation, if you're agentic code, you're gonna be a lot here.
If you're enterprise worker, you know, and the average salary of that person, let's pick, you know, pick a number, say $50,000 or $70,000, you might be here. You want your product. If your customer is that person, you want your product priced somewhere here. Does that make sense? It depends on your customer and the work that you do for them. It depends on the customer, the work you do for them, and the competition. Those three things matter. It's just exactly like products. AI tokens are products, a new commodity, and it'll be marketed as such. Different suppliers, different brands, different target markets are gonna have different shapes. I just simply chose an equal distribution yesterday. Make sense?
Well, yeah. Just which segment do you see is growing faster in the future?
They're all gonna grow really fast at the moment. At the moment, it just doesn't matter. They're all gonna grow so fast. They're all growing exponentially at the moment, every one of them. We're at the beginning, right? We're at, you know, the growth rate is divided by a very small number.
Hi, I'm Mark Lipacis, Evercore ISI. Thanks a lot for doing the Q&A. Always love the insights. Jensen, our field work is telling us that AI engineers are getting excited about State Space Models because they address memory requirements. In your keynote, you showed Nemotron-3 is benchmarking in one of the top models, and I believe that's a hybrid Mixture of Experts State Space Model. I'm wondering, you know-
Impressive.
Thank you, Jensen.
Impressive.
In the past, new AI workloads have led to the adoption of different AI models.
That was my Darth Vader imitation. So impressive. Young Jedi.
The question is agentic AI creating a new demand, a need for a new AI model? Is that what you're doing with Nemotron and the hybrid? What does state space get you for Nemotron-3 that pure mixture of experts didn't and what are the implications on the competitive environment for NVIDIA if there's this transition to a new kind of AI model?
We run all AI models, whether it's full Transformer, discrete tokens, continuous, diffusion, state space, hybrid. Our architecture's beauty is that it does it all. For example, Groq can't do diffusion models. But we can do everything. Does that make sense? No, I'm picking on Groq not because I'm picking on Groq. It belongs to me now, so I can say these things. You know, but every architecture has its place. The reason why NVIDIA is so versatile and the reason why it's used so freely everywhere is because irrespective of what innovation your research scientists come up with tomorrow, I promise you it's gonna run great on CUDA. I just promise you that.
The reason for that is because it's, we know we have all of the necessary computing elements to do all of it, okay? Nemotron-3 was designed so that you can deal with extremely long context. In time, the AI models, you're gonna have conversations with your AI, hopefully for as long as you shall live. The question is, how do you deal with context? How do you deal with the relevant conversational memory so that on the one hand, if you memorized everything and we talk about something over time, which version of that memory do you pull back? When you have too much memory, over time, it could become garbled and, you know, maybe a reset is helpful. These are research areas.
Long memory areas are really research areas. But the hybrid architecture, I think is going to be a very major thing because it allows you to deal with extremely long context and not have to suffer the quadratic explosion in computation. That's the reason why we invented it, and we put it out in open source, and it could. We love for everybody to use it. Yeah. It's intended to advance AI, not to compete with anybody. We don't need to. We just wanna advance AI. Impressive.
Thank you, Jensen. I'm trying to understand how concentrated your downstream, like the AI market is and is going to be. You have this chart showing 60% is hyperscalers. I'm kind of thinking the other 40%, the majority of that is Tier two clouds, and a lot of them are actually reselling or renting their capacity to hyperscalers or to the frontier labs. If you take hyperscalers plus frontier labs, it might be like 80% of people actually using the infrastructure that is being deployed. That's an element of concentration. These models, like the Anthropic models, the OpenAI models, et cetera, seems to be like a very small handful that are really at the frontier. Do you think that's the right description of the situation today? How do you see that evolving?
Maybe what does that mean in terms of the right to make money in the value chain and the development and like the further acceleration of AI?
Okay. I would slice it into three dimensions, okay? As you were talking, I simplified it as much as I can into a cube, into three dimensions. The first dimension is what is the end model being run? I said earlier, OpenAI is the largest. The second largest by category is basically all the open models. In aggregate, it's by far definitely solidly number two. Number three would be Anthropic, and then, so on and so forth. Okay? The tail is actually, you know, fairly long. Okay? If you look at the world of model consumption, even just language, that's the way to think about it. We run all of them. We're in all of them. That's one dimension.
In that sub-dimension of models, you have to decide to add also physical AI models, which is robotics. Like all the robots you saw, they're not running, you know, they're running vision-language-action models. Those models are different than just language models. For example, the control of motors is continuous. It's not da, da, da. You know, it's not like a character. It's not like words. It's continuous. Physics is continuous. Biology has geometry. It obeys geometry because things, chemicals, obeys geometry. Okay. There's a lot of different types of models. But point being that you have to first think about the different types of models being run, and that's helpful to how you think about the right business.
The second dimension is depending on the way that the companies are structured and their intentions or interests, they are either companies that wanna build their own chips, and we have to compete with. Companies that wanna host NVIDIA customers in their cloud, and obviously, you know, CUDA only runs on NVIDIA CUDA. Are they companies, like, for example, CSPs, where they need us, they can't just buy chips, they really have to buy systems. They're really infrastructure customers. Are they companies that wanna build on-prem, therefore my distribution channel goes through Dell and HP and Lenovo because it has to integrate a whole bunch of other enterprise computing components. Dell and HP, they don't build their own chips.
Are they at the edge, maybe they're radio networks, maybe they're robotic systems or self-driving cars or satellites and so on, so forth. Does it make sense? Now you gotta decide where is the computing being done? Okay? Those are kinda the, you know, several dimensions, I guess, you could think about it. When you're done subdividing all of that, you come back to the chart that I showed you, 60%-40%. Within that 60%-40%, 40% of it, basically, they need computing power. It doesn't matter what models they run. It could be OpenAI models, it could be Anthropic models. The fact that NVIDIA supports confidential computing makes it possible for OpenAI to run on the right side at all.
We make it possible for Anthropic to run on the right side at all. Because we have confidential computing. That side, they want entire platforms. They want confidential computing. They want computers at different parts of the world, not just in the cloud. Even in the cloud, we compete with some part, but we also bring customers to the other part. Some part of that CSP chart of 60%, we have to compete. Our job is just to deliver that chart better than anybody else in the world. We're doing very, very well. We're actually increasing our position day in and day out. Then the other part, we bring customers to them. They're just grateful. Make sense?
I took all of that dimensionality, and I compressed it into basically two pies, two slices of a pie. That compression, I think if you test against, do they design their. Does NVIDIA compete with them on chips? Okay, there you go. That's interesting. Then you gotta figure out, you know, where are we in our position, and what's our opportunity, and so on and so forth. I don't think OCI will design their own chips. I don't think it's sensible for them to do it. Obviously, CoreWeave's not gonna design their own chips. There we, you know, where do we compete and where do we bring the cloud service provider customers? Their cloud revenues, a lot of them, a big part of it, obviously OCI, nearly 100% of that is because of NVIDIA, right, with OpenAI.
We'll take our last question.
Hi, it's Timm Schulze-Melander from Rothschild & Co Redburn. Maybe just a question around how you run the company, Jensen, and looking ahead. This 12 monthly flywheel is a critical part of your competitive advantage. When I look at headcount, actually it seems to be growing very slowly, relatively slowly, and yet the undertaking that you are g oing forward is growing much more rapidly than that. You know, how do you manage that or prepare for that going forward, and how do you manage maybe the risks that could pose to your business?
As you know, I have 60 people on my direct team. The reason why we need 60 people is because the company's architecture was designed to deliver on this architecture, on the products. The organization, the architecture of a company should reflect the products they build. Every company should not have the same business org, you know. I look across and I said, "Oh, look at they have a business unit here, they have a business unit there, they have a business unit there, and yet they wanna build what we wanna build." You know, what you build as a company, for example, the way not because I've seen it, I've read about it. The way you build a Ferrari and the way you build a Ford is very different.
In one case, you move the car, in the other case, you move the people, okay? The car stays stationary. It depends on the results of what you wanna create. The architecture should reflect it. If you look across my management team, every aspect of the technology necessary to build Vera Rubin's entire factory is right there, 100%. Everybody is represented. All of the expertise sitting at the table making a decision together. The second thing is we have the discipline to develop the entire software stack. You can't build what we build on a yearly basis if you can't bring it up. Are you guys following me? It's very logical. How do you test it if you can't bring it up?
If you're cobbling up new technology from everybody else, how do you bring it up once a year? It's just not even practical. It's not possible. We align all of our chips to the platforms. All seven chips, they only have one tape-out schedule. I don't cobble up everybody's tape-out schedule and figure out when the system comes. The system comes when it, you know, when the system needs to come, and everybody aligns to it. The software stack, we completely own every piece. The storage, that's the reason why we developed it. Networking, of course. All of the, you know. Even the factory operating system we call Dynamo. We created everything so that we could deliver every single benchmark, test everything to the limit, test for reliability, test for.
The reason why NVIDIA built Nemotron is so that we could do pre-training, post-training, and now we can do inference. We own all of the software so that we can bring up all of the systems on an annual basis, which basically says you're bringing up all the time. If you don't own everything, you have no shot. 0% chance. People are talking about their new GPU, but where's their scale-up fabric coming from? And how is that gonna work? That's just I just gave you two examples. That whole agentic system that we were talking about earlier, that's the future computer. That's really what we The company's organization, the company's mission, the company's capabilities are all aligned to me delivering the promise that I just delivered to the marketplace. That's why we're able to keep doing it. This is.
A PowerPoint slide is not gonna deliver that system. A PowerPoint slide with two bar charts is not gonna convince somebody to give you $50 billion. It doesn't make any sense. To engineer it all into existence inside the data center, by the time that you bring it up, we're already two clicks down the road. This is the pace that we put the whole industry on, and it is frankly extremely hard. You know, we could do it, but that's because of all the things that I just described. You also know that every one of our systems is CUDA compatible. On day one, I've got yesterday's software that runs perfectly on this one. I own all the scale-up switch, I own all the scale-out switch, I own all the software. Do I not?
On day one, I take yesterday's software and I put it on the new system. If it doesn't work, what's the point? Once we get everything brought up, because we own all the software stack, then we can take it to the limit. Having CUDA compatibility, we have this thing called DOCA. DOCA compatibility, we own all the compilers, we own all the software stack. Really, really important. You can't outsource that to other people. You know, somebody else is building it on your behalf. That is. How do you bring up a system? They're not gonna bring up your system for you. They're not gonna qualify it for you know? That's it. We'll have. Can we take one more question? Is that okay? Can you guys tolerate one more question? I'm enjoying this so much.
Somebody's gonna ask me a question where I have to choose the precise word. Hair or a hair. Did he say hair or a hair? That's materially different.
Thank you. Thank you for extending the session and squeezing me in. Jensen, I just wanna clarify one thing.
Oh, here it comes. Oh, dear. I change my mind. Everybody have a good GTC.
Quick clarification. Does the $1 trillion+ include Rubin Ultra or not? My question is,
No, I gotta stop you right there. No. Thank you. No, no. Absolutely not. And yeah. Absolutely not.
Okay. My question is, we talked a lot about inferencing, you know, at this event. I just was hoping that you could spend a couple of minutes on training, in terms of how do you see the, you know, the compute intensity growing? What will drive, in your view, over the next few years? Is it still the larger and larger models, or is there something else on the horizon that you see? I guess if you take a three to five-year view, what's your view on, you know, training versus inferencing mix in terms of compute demand? Thank you.
Training went from pre-training to post-training. Pre-training is basically memorization. Memorization and generalization. The more you memorize and generalize this, the better foundation you have. Once you have that foundation, that's why it's called pre-training. It's kinda like, you know, AI kindergarten, okay? It's more than kindergarten, but AI high school. Now you have the pre-training. You have the basic vocabulary and grammar and a lot of hidden reasoning capability that when I teach you new skills, you'll even understand it. Now when I go tell you to go solve a math problem or write code or try to write code, you actually understood what I meant. If you don't even understand what I meant, how can you possibly even attempt at doing it? Pre-training does that.
Post-training teaches you all kinds of skills, okay? Reinforcement learning with executable grounding, reinforcement learning verifiable feedback, there are a whole bunch of technology techniques for batch-oriented reinforcement learning, you know, tool use. I mean, the list goes on and on, okay? Structure-based APIs, unstructured-based tool use. I mean, there's just a whole lot of domains. That part, computing intensity, I'm gonna guess probably one million times more than pre-training. You know, I'm probably off by a factor of, you know, 1.2, you know? But it's a lot. The reason for that is because there's a lot of skills to go learn, and all these skills, the rollout is really long. The models have to get larger and larger.
When you get good at these, you take all of that synthetic data, and some of it you're gonna push back to pre-training next time. Yesterday's pre-training started all from internet data. Today's pre-training is mostly internet data. In a couple of generations, pre-training will be mostly synthetic data. Meanwhile, you're adding multimodality to it. Meanwhile, you're adding motion to it. Along with rollout physical actions to it. The reason for that is because there's a lot of common sense that's cognitively logic-related that if you were able to interact in the physical world, you could deal with that concept a lot easier, even in the abstract world, okay? Because you actually have grounded experience in the physical world.
Notice the amount of computation that I just described, you know, we're one million, one billion times future amount of computing necessary for training, and then after that, continuous learning. Almost everybody's model will be lastly trained, fine-tuned, so that it could also be memorized and generalized per person. In the future, basically, where inference starts and ends and where training starts and ends will become blurrier and blurrier. Just kind of, when are you learning, and when are you applying your wisdom? Well, in most people's cases, it's continuous now. I think that's kinda gives you the three phases of it. With respect to inference versus training, let me tell you my hope. My hope is that 99% of the world's compute goes towards inference.
The reason for that is because inference is where we translate tokens generated to economics. Nobody pays you for learning. Nobody pays for training. You pay for training. I want the world to be able to use these tokens for valuable outcome, impactful outcome for healthcare, for manufacturing, for financial services, right? For engineering, right? You name it. Isn't that right? We want the world. That's our hope, is that 99%. You know, if our dreams come true, 100% of the future tokens are going towards economic benefits while the AI models are learning. It's. There's a really good reason why NVIDIA went all in on inference last year. The reason for that is because we see this future where inference and training and pre-training and learning and all that is just one big continuum.
It's not as if. You know, go back and read two years ago the stories people write. NVIDIA really good at training. Inference is easy. Any company could do that, and therefore. Do you guys remember that? Inference is super hard. Look at this chart. It's super hard, and it's getting way harder. Inference is thinking, it's working, it's doing things. How could that be easy? I thought my life was easy pre-high school, not post-high school. You know? Pre-high school, super hard. After that, it was. You know? After that was super hard. I think people just got it all completely backwards. They just wanted to make up stories that rationalized, you know, their opportunity, which is fine. But you had to reason about it from first principles.
You know, I take a long time answering questions for you guys instead of a short, highly curated, super well-selected, precisely adjusted verbs and nouns. The reason for that is because I want you guys to learn how to reason through these things, so when you see it yourself, you go, "Nah, that's not making sense," or, "That makes sense," or, you know. Because you're analysts. You need to be able to understand these things. Okay. All right, guys. Thank you very much. Thanks for coming to GTC.