CES Financial Analyst Q&A

Jan 5, 2026

Jensen Huang

President and CEO, NVIDIA

New Year?

Speaker 20

New Year.

Jensen Huang

President and CEO, NVIDIA

Oh. It's all right. You could talk. How do we get this going? It's almost like we're doing this for the very first time.

You guys have questions?

Colette Kress

CFO, NVIDIA

I think we have our friends there with microphones.

Jensen Huang

President and CEO, NVIDIA

Yeah. First question over here.

Colette Kress

CFO, NVIDIA

All right.

Atif Malik

Stock Analyst, Citigroup

Hi. Thank you. It's Atif Malik from Citigroup. Thank you for the great presentation. Jensen, you had a slide on the number of tokens, 10x more tokens on Rubin versus Blackwell. The question I have, historically, you have shown a slide of Blackwell performance versus TPUs on training. Any kind of simulation that can kind of put the performance of Rubin on the inference versus TPUs?

Jensen Huang

President and CEO, NVIDIA

It's hard to because the only thing that's available is MLPerf. And we subject ourselves to a fair amount of, I had to spit out my candy. I came without a voice. I don't know where I left it. It didn't show up today. As you know, we subject ourselves to a whole lot of benchmarking because NVIDIA is everywhere. And we're easy to benchmark. But nobody could really benchmark a TPU unless you're the TPU people. And so we don't have anything to benchmark. If you guys have anything to benchmark, we're happy to take a look at it. And I think you'll find that it compares very nicely. MLPerf is an indication. And MLPerf is quite rigorous. MLPerf is largely governed by Google. And it's very rigorous. It's so rigorous that almost nobody finishes the test. We're the only company that has ever finished the test every time.

We finish the test every time. We subject ourselves to submission every time. So you know you're allowed to take the test. And if you don't like your own answer after you see other people's answer, you could withdraw your submission. It's the only test in the world that has this kind of civility. And we're fine with it. And so we're usually the one that submits first. And so everybody sees our answers. And then they decide whether they want to submit or not. And so my sense is that a lot of people have taken the test. Not all of them can finish it. But some of them have. I just don't think that they submit it for maybe ulterior reasons. And so that tells you something.

If we're the only one that submitted the test and everybody else is empty, I think that just tells you what the answer is. It's not like nobody showed up to the Olympics. They all showed up. They just decided, "Oh, we'll just let you run by yourself." But one of the best benchmarks is actually SemiAnalysis. It's a living, breathing benchmark that's there. I like it because it's living and breathing. It's continuously updating, as you guys know. And so I like that fairly well. And I hope that they submit themselves to that. Because DeepSeek and Kimi, Qwen , these are state-of-the-art reasoning models. They're based on MOE. They're very hard. It's not for the faint of heart. Almost anybody can run it to completion. I mean, you'll get a token out of it.

But to do it at the rates that we're talking about, it takes superhuman capabilities at that point. And this is where NVIDIA's extreme co-design capability, where we're designing across the GPU, the CPU, the NICs, the NVLink switches, the CX NICs, I mean, all of that is working together. The amount of software that has to come together for a rack that I was showing today, that pod that I was showing today, just if you think about it for a second, aside from the one that you saw there today, I don't think anybody's ever seen one really built from the ground up with that kind of capability. And so NVIDIA wrote every line of code and designed every chip, created all the systems, optimized all the algorithm. We contributed everything back to open source so everybody else can take advantage of it as well.

That tells you something about the level of leadership that we have. DeepSeek's another example. I think today somebody just published, is it Signal65 or something like that? They just analyzed our performance. It shows about a 10 to 1 from Hopper to Blackwell and 10 to 1 reduction in cost. Yet between Hopper and Blackwell, the transistor count was only two times, right? So that tells you why it's so essential anymore to do co-design at the level that we're talking about. Because if you can't change everything across the whole thing, how are you going to overcome Moore's Law? If your architecture is basically the same and you're just making a faster XPU, whatever PU you like to build, if you're just building one chip, who cares? Amdahl's Law gets in the way. You're not getting that many transistors anyhow.

And so I think we've fairly clearly revealed that in this new world, if you want to keep up with the rate of model size growth, 10x growth, and token growth, and cost decline, and getting to the new frontier, and you want to go fight every single one of these battles, unless you're able to keep up with the type of systems we're talking about, I think you're going to have a very hard time.

Vivek Arya

Managing Director, Bank of America

Yeah. Hi, Jensen. Hi, Colette. Vivek Aria from Bank of America Securities. Thanks for the informative keynote. Just wanted to clarify, Jensen, this Vera Rubin in full production, what does this mean? How is it different than what you thought before? Are you able to now ship it and recognize revenue faster? So just wanted to clarify that. And then my more strategic question is about your Groq licensing announcement. What does it mean in kind of the near and longer term? Are we now getting to a place where NVIDIA thinks that you will need more specialized data, say, ASIC-like chips for certain kinds of inference? What does it mean for your roadmap going forward? Should we expect to see a lot more of this kind of specialization, right, and what the implications are?

Jensen Huang

President and CEO, NVIDIA

OK. Yeah. I don't mean to be pedantic, but NVIDIA's chips are ASICs. As you know, I was the youngest employee at the first ASIC company the world ever created called LSI Logic. Wilf Corrigan, I think I was employed in 150 or something like that. And I was just a kid out of school. And I was interested in ASICs because it was really about system designers using design tools to design their own chips. That's really what that meant. NVIDIA, in a large way, is just a systems company. It's much more natural for us to stand in front of a rack like the Vera Rubin pod than it is for us to stand in front of a chip. And so we're very much a systems company. And we use design tools and TSMC to build our version of ASICs. So they're ASICs.

There's no question in my mind that it will be. I think the probabilities of keeping up with Vera Rubin is very low for the industry if you're building one chip at a time. I would say that it would. I don't want to say impossible, but it's modulo something close to impossible. It's not a one-chip wonder thing. And you're not going to build a chip, connect them in a torus point to point because it's easier to build the interconnect if one chip connects to another chip connects to another chip. There's no switches to design. But in the case of MOE, you need all to all. Every single layer is an all to all layer. And so in our case, we literally just send information all to all to all the GPUs. The GPUs just send information to each other through the switch. It's one hop.

In everybody else's case, you've got to pass the token from one chip to another chip to another chip, depending on the size of your pod. That could be nine hops. It could be five hops. And if you're doing this repeatedly as a fundamental part of the processing, it really adds up. Amdahl's Law gets in the way. And so I think it's not that. I think in the case of Vera Rubin, in the case of Grace Blackwell, I think that to build a high-throughput, high-performing AI factory, I'm fairly confident with the strategy that we have. Now, the question is, in the case of Groq, I think they came to the conclusion that there's just no nook and cranny to fit into. And so they were quite interested in being part of our company. And I really like the team.

The reason for that is because even though the mainstream part of AI is likely to remain the type of, remember, the model builders are building for an architecture that would run, right? Whose architecture is the most pervasive in the world is ours. By definition, the model shape kind of wants to fit the processor. It's a bit of a chicken and egg, egg chicken. It's a feedback. It's a positive virtuous cycle now. It's very likely that the vast majority of AI is going to run that way. I really like the work that they did with extreme low latency. Low latency and high throughput are enemies of each other, just fundamentally. NVIDIA was built for extremely high throughput.

The question is, is there a place in AI in the future where maybe the response time is like literally instantaneous? And you're willing to pay something for it. You're willing to pay something for it. Let's say it's connected to my glasses. It's a use case not today. And let's say it's not a use case of normal. And literally, it's like right there in your head, but it's in the cloud. Are you guys following me? And so the latency has to be super low. It can't afford this a few hundred milliseconds up, a few hundred milliseconds down. You say something, the AI thinks about it for a second or two and then responds. It's really hard to have this interactive, very comfortable, the sense of persistent AI. And so maybe there's a place where we might, if you will, create something unique.

And so maybe a combination of something like that. But I'm just shooting the stuff with you right now. I can't tell you what I'm going to build. But it probably is quite unique and quite cool. But it won't affect our core business. I'm hoping that it expands something new, opens something new.

Colette Kress

CFO, NVIDIA

So let me answer your second part of your question regarding Vera Rubin. Jensen mentioned on stage that we're in full production. If you recall, most recently, we said the chips have taped out. So we're showing you the progress that we're making. But we're still planning on the second half of this year in terms of bringing that to market.

Jensen Huang

President and CEO, NVIDIA

Yeah, cycle time is nine months plus, I would say.

Speaker 17

Jensen and Colette, thank you for the tremendous presentation. You guys are on the cusp of transforming so many tremendous industries, the data center market, which you're attacking, autonomous. You mentioned a lot about that, robotics. Could you help us understand the scope and timing for some of these key markets that you're addressing? Specifically, I would love to understand the revenue model, how you're thinking about it for Alpamayo, and then also timing of robotics. Thank you.

Jensen Huang

President and CEO, NVIDIA

We started. The first person I assigned to work with me on autonomous vehicles was eight years ago. Three years after that, we were able to demonstrate that we could do this, and Ola, the CEO of Mercedes-Benz, Mercedes-Benz partnered with us to build this into their fleet. It took another five years to architect their entire fleet to make it industrial safe to scale at the levels of a passenger vehicle that is quite sensitive about safety, and so the safety technology we implemented into the Mercedes-Benz and the driving capabilities took this long, and now we're in production. Meanwhile, we take all of that technology and we share with everybody, and so you might have seen on the list, BYD is a great partner of ours, Geely and Xiaomi and Stellantis, and just about every robotaxi company is working with us.

Nuro, I should have said it. Nuro, I think, is announcing a robotaxi at the show. And so whether it's our data center systems that are used to train the models, so in the case of Tesla, they use our data center systems to train their models. And we've got a bunch of open source software, as I mentioned, all of our infrastructure stack. Some of it's related to simulation. Some of it's related to synthetic data generation and world foundation models. And we make all of that available to the AV industry. So whether it's Tesla or BYD or Xiaomi or you name it, Li Auto and XPeng, all of these companies are using NVIDIA. Almost everybody who has a self-driving anything has NVIDIA in the data center. And that's billions of dollars. And it's just getting started. Some of them use us in the car.

And so all the car dimensions. And then, of course, some companies, in the case of Mercedes and several others, are going to deploy the entire stack. So we make it so that we build the whole thing. But we're delighted when the ecosystem thrives and somebody else delivers the whole stack instead of us. Or they actually build a chip, but we're actually in the data center, and they're actually using our software stack. I'm actually OK with all of that. It doesn't matter. We want a thriving ecosystem. And I'm so confident that we're going to be at the center of it no matter what. And so I think autonomous vehicles at this point, whether it's Waabi building a self-driving truck or Aurora, Chris's company, and robotaxis of the likes. So it's going to be quite a large significant business now in the next 10 years.

So how long did it take us? It was eight years getting here. But then now it's already a multi-billion-dollar business. I ventured to say somewhere between $5-$10 billion. By the end of 2030, by the end of the decade, it's going to be a very large business. It has to be at this point. That's how long it takes. But we're not wired that way. The way that we're wired, we ask the question, number one, is this insanely hard to do? And if it's not insanely hard to do, then why do we do it at all? We ask ourselves, is it hard to do? Is it something that we are just uniquely fit at doing? And I think in this particular case, we're fairly singular in our ability to support basically the rest of the ecosystem.

There are two very good companies, Tesla and Waymo, who are excellent at doing it for themselves. And we do it basically for everybody else. And I think we're quite unique in our ability to do that. And then the last part is we like things that take a long time. I love being part of an industry where at the beginning, nobody even gives you a bit of a never mind about whether this is going to be a growth business or not. And which means if you take a look at almost everything that I do, and I literally play it out for you guys in plain sight. And I know 99% of you just go, that's zero. That's zero. That's zero. That's zero. And I love that. I love that. I love the fact that NVIDIA is powering just about every quantum computer in the world.

And everybody goes zero, zero, zero. And I love the fact that people call us, I mean, somebody actually described us as a gorilla in the health care industry. How could we be a gorilla in health care? But that's because we're working with just about everybody, the robotics companies, the imaging companies, the AI companies, the drug discovery companies. And so eight years ago, that was zero. And so I'm very comfortable with that. And I like these things where it takes a long time. But when you finally get there, it's very likely you'll be quite alone.

Ben Reitzes

Managing Director and Head of Technology Research, Melius Research

Hey, thanks a lot. It's Ben Reitzes, Melius Research. Great to be here. Jensen, the first part of my question is really serious. What's going on with the jackets? You had a really shiny one in the keynote. And now there's a dull one. And I was kind of going to buy the prior model. Is the quarter going really well? Are we in a multi-jacket business model now? Or what's going on?

Jensen Huang

President and CEO, NVIDIA

There's no question I'm in a multi-jacket life. That's because business is going pretty well, Ben. Thanks for asking. I think that's a perfect tee-up for demand is really strong. Demand is really strong. The reason why I wore that one is, as you know, we're in Vegas. We can't take ourselves too seriously. However, I always feel a little insecure walking up like that because there's somebody who's a shareholder, probably. And they didn't know that I'm in Vegas. And they pull this thing out. And here's a CEO of NVIDIA wearing this glitzy jacket. I always wonder about that one person. But I said Vegas. I was very clear. We're in Vegas. What happens in Vegas? And that's why I can't wear that jacket now.

Ben Reitzes

Managing Director and Head of Technology Research, Melius Research

All right. Well, now I have to ask a serious question. You showed that great slide. It was the second to last slide with the 1/10 and the better token economics. I wanted to ask you about Anthropic. They could have got all the trainings they want. They could have got all the TPUs they want. And now they're going to use a lot of your compute. Is that slide like did they see Rubin coming and feel like they need to get going? And is there anybody else taking certain workloads that are going to that maybe were geared for TPU trainings or something else that are going towards Rubin because they got to get part of that token economics?

Jensen Huang

President and CEO, NVIDIA

We really would have. If I could have rewound time, I would have made some different choices with Anthropic because in the beginning, they really wanted to work with us. But they also needed funding. And at the time, we just didn't have the resources to make funding to another startup at the scale that they needed. But there were two other companies that did. They were already quite large. Amazon and Google were already quite large. And so they were very supportive of Anthropic in the beginning. And we couldn't afford it. And so I think that that's kind of my excuse, I guess. But the thing where Anthropic is now, nobody is generating more high-quality tokens because it's delivered for some of the most important use cases in enterprise, which is coding. And Anthropic's Claude is really, really good.

And as you know, one of the challenges for Claude is that their token generation rate is too long. And for software engineers, you really want to iterate. And so you need the token generation rate to be supremely fast because you're iterating. It generates an answer. You might not like it. It generates another answer. And so you're iterating with this AI assistant. The iteration process requires fast token generation. And I think we could add a lot of value here. And so I think I'm really happy that we're working together. NVIDIA is now the only platform in the world that runs every model. We run xAI. We run, of course, OpenAI, Gemini. We run Anthropic. And we literally run every single open source model in the world, whether it's physical AI or cognitive AI.

And so the ability for us to be the every AI company, to be the every AI company, I think is a really great positioning because when you're trying to build something, you have no idea. If you're an enterprise company or you're building your own cloud, let's say it's a sovereign cloud, or you're an AI lab, you really have no clue where the journey is going to take you. And so you want a platform that runs everything and that the ecosystem is really rich. And so I think with the Anthropic announcement last year, that was really the last one, the only one that didn't run on NVIDIA. And so I'm very, very happy about that.

Ben Reitzes

Managing Director and Head of Technology Research, Melius Research

Thanks.

Aaron Rakers

Managing Director and Technology Analyst, Wells Fargo

Thank you for doing this Q&A session. Aaron Rakers at Wells Fargo. Appreciate the conversation. There's a lot of dynamics going on around the supply chain, be it VRAM pricing, be it supply availability. But I'm curious if you could talk about that a little bit. And I guess the second part of that is there's been recent discussions around power. And I'd be curious how you see the power envelope playing out. Is that a limiting factor to what you see as far as this build-out? Any updated thoughts around that? Thank you.

Jensen Huang

President and CEO, NVIDIA

Our supply chain goes upstream and downstream, and our advantage is that because our scale was already so large and because we were growing so fast already at such a large scale, we were preparing our partners for this large ramp quite some time ago. All of you have been talking to me about supply chains for, what, two years now? Because of the scale of the supply chain that we have and the growth, not just the rate of our growth, but the scale and rate of our growth. Every quarter, we're growing the size of an entire company. Isn't that right? And that's just a delta, an entire giant company. We're not talking about the start. We're talking about growing a publicly traded chip company every quarter.

And so all of the supply chain stuff that we did with MGX, the rack level, the reason why we're so thoughtful about how to improve all of that and standardize the components and don't waste the ecosystem, don't waste the supply chain for our partners, and all the investments that we've made in them. And many of them, we supported with prepay so that they could build out their capacity. Not tens of billions. I mean, we're talking hundreds of billions of dollars stuff that we're helping the supply chain get primed up. And so I think we're in a good position because we've had such a long relationship with them. And remember, we're just about the only chip company in the world that buys VRAM. If you take a second, you take a step back, we're the only chip company in the world that buys VRAM.

People have asked us, why do we buy VRAM? Because as it turns out, turning that VRAM into a CoWoS, into a supercomputer, is supremely hard. And getting that supply chain really plumbed up, it gave us a huge advantage. Now that things are in a tough spot, we feel fortunate to have the skill. And then we also, speaking of power, so look at the number of partners we have upstream, the number of system makers. And we're buying memory. And they're buying memory. We're buying multi-layer ceramic capacitors. And they're buying MLCCs. And they're buying PCB. And we're buying PCB. And we're getting everybody all around us. Now look downstream. So the diversity and the scale of our supply chain backwards upstream is a huge advantage. We're the only chip company that buys directly tens of billions of dollars of VRAM from all the VRAM makers.

We buy from every HBM partner. Isn't that right? We certified and qualified every one of them and got them all prepped. Now go downstream. We're the only company in the world that works with every cloud in the world. Isn't that right? The outlets of our technology is incredible in every country, big ones and small ones and startups and sovereign nation ones and government-funded supercomputers and labs, and we're working with every single one of them, and so we have partners that are so diverse and broad, so our outlet downstream is really good. Not to mention, as you know, we realized this importance of knowing downstream our supply chain that's going to affect our growth that we invested in, partnered with, and supported land, power, and shell companies.

People asked us why we're doing that to get ready for today because we have to see our supply chain from all the way back from the equipment makers. That's why from AMAT to ASML, who were partners, all the way down to setting up a supercomputer and generating the first token. If you just look at that entire path, you will see partners, customers, NVIDIA invested companies up and down that thing because we're thinking about it constantly, getting the world ready to remodernize $10 trillion of the last decade's IT investment. Isn't that what we're trying to do? You just got to what are you trying to do? Say it out loud. What does it all mean? It's not a PowerPoint slide. This is a giant endeavor what we're trying to do. It happened. It's happening on first principle reasons.

I mean, are we building these things? And will there be consumption? We fundamentally changed the computer. You can't use that last one. You got to use this one. And every customer knows it. Everybody in the ecosystem knows it.

Stacy Rasgon

Senior Analyst, Bernstein

Thank you. Stacy Rasgon with Bernstein. I have two questions, maybe one for Colette and one for you, Jensen. Colette, and maybe it builds on the supply chain points that you just made, Jensen. Around China, so clearly now there may be an opening to sell parts there that were not sort of being contemplated before. Is the supply chain right now robust enough to support, I don't know, what we might call a meaningful ramp without impacting the rest of the business and the supply you've already secured for the rest of the business? And Jensen, if I could just very quickly touch on you made some comments on DGX Cloud early in your talk. And you talked about that was not a market you were trying to enter. But I mean, you clearly were trying to enter it at one point. What's changed?

It feels like something has changed there. Was that customer pushback? Or was it just the opportunity set or what?

Colette Kress

CFO, NVIDIA

Let me take your supply question regarding H200 and what we may do for China. We have plenty of supply for all of our different countries, particularly in the United States, to meet all of the different demand that we have here. So what we will do for H20 is just supply that we will have specific for China. So we're not going to take away from anything we already have with all of our different countries and their orders and demand. So we're still awaiting where we are with the H200. First, with the government. The government has received licenses, and they are working in terms of how do they want to process those different licenses. On our side, yes, we do have demand asked definitely from China.

We just have to make sure we've got a togetherness across all the different governments on getting able to ship that.

Jensen Huang

President and CEO, NVIDIA

Yeah, you should be happy to hear demand strong.

Stacy Rasgon

Senior Analyst, Bernstein

Demand strong. We got that.

Jensen Huang

President and CEO, NVIDIA

Yeah.

DGX Cloud has always been located inside a CSP. So it was never intended to compete with them. It was intended to do several things. One, prepared them for the new architecture. And because of DGX Cloud, that first mission statement is preparing them. 100% of the world's CSPs were not AI cloud companies, 100%. 100% have no clue this world. And the first time I met them, 100% rejected us. No, we don't need those kind of things. And so DGX Cloud was engineered, was created as a strategy where, OK, in that case, because I need it for my own AI models, I'll work with you. I'll partner with you to build it in your cloud. And then after we're done, you have an exemplar NVIDIA Cloud. OK? That's number one. So that we can use it as a forcing function because it's a business transaction both ways.

It could be a force because I need to rent it for my own AI models. So I have a strong need. And instead of me setting it up myself, I put it in their cloud, number one. Number two, we attract developers there. There's 20,000 AI natives. You guys probably hasn't gone unnoticed to you guys this last year, about $150 billion of investment into AI natives because and these AI natives, one of the things about AI natives, it means they have cost. AI companies, enterprise companies in the past had very little cost. AI native companies have very high cost infrastructure. And so when we work with all the AI native companies on DGX Cloud, they become a great customer for the CSP second. So we're really using it as a customer attractor for them.

We also do the third thing, which is all the models that we created, we land it in their clouds so that we can connect our friends like Siemens and Synopsys and Cadence and ServiceNow and Adobe. And whenever they work with us, they're essentially going to land inside one of the CSPs. We're one of the best salespeople for the CSPs. We attract so much customers to the world's CSPs. It's incredible. And that's the reason why on the one hand, they build TPUs. They want to compete with us. On the other hand, they're so gracious about us being in their cloud because we bring customers to them. NVIDIA runs every AI, as you guys know. And so the fact that we can put ourselves into their clouds, do I need to do that anymore with DGX Cloud?

I think increasingly the answer is not because I think the flywheel is started. However, I do need a whole bunch of capacity myself because, as you know, the world's number two cloud is actually the world's number two AI is actually open source AI. The number one is OpenAI today. We can all acknowledge that. They generate more tokens than anybody. More services are connected to them than anybody. But the number two solidly is open source. And so we have to go make sure that we continue to invest in building these open models. And we have now established that we are a pretty awesome frontier AI model builder. And we contribute tremendously to the ecosystem. And everything we contribute benefits our platform. Everything we contribute benefits the ecosystem and also benefits the verticals that we're going into: self-driving cars, robotics, health care, so on and so forth.

And so I think the flywheel, the equation, the strategic rationale is really solid in all of it. And I don't need to rent one token. Now, how is it possible I can rent from them and re-rent it and make money? It doesn't make any sense anyhow, not to mention I'm actually sitting in their cloud. There's no differentiator. And so it was never intended to be a business.

Stacy Rasgon

Senior Analyst, Bernstein

Got it. That's helpful. Thank you.

Jensen Huang

President and CEO, NVIDIA

It was intended to be a very clever strategy. Turned out to be quite clever.

Jim Schneider

Senior Equity Analyst, Goldman Sachs

I'm Jim Schneider, Goldman Sachs. Thanks for the presentation. I was wondering if you maybe talk a little bit about the context memory storage controller you announced today. How important is that across a range of use cases? Did you see that as being a bottleneck to performance of a certain subsegment of customer problems, and should we expect you to sort of continue innovating on that vector similar to what you did in networking in the past?

Jensen Huang

President and CEO, NVIDIA

We're the largest networking company in the world today. I'm expecting us to be the largest storage processor company in the world. That I think is likely, very likely to happen. And it's very likely that we will ship more high-end CPUs than just by anybody else either. And the reason for that is because if you look at Rubin and Grace and Rubin, they go into the SmartNIC of every single node. We are now the SmartNIC of AI factories. A lot of CSPs have their own SmartNICs like Nitro. And they'll continue to have theirs. But outside, BlueField is incredibly successful. And BlueField- 4 is going to knock it out of the park. And so the adoption of BlueField- 4. And the software layer on top is called DOCA, rhyming with CUDA.

DOCA is adopted all over the place now. For networking, east-west traffic, high-performance networking, we're the largest. For network isolation, north-south networking, I'm fairly certain we're going to be one of the largest. For storage, that is a completely unserved market today. The way that storage works is SQL. SQL is structured data. Structured database is lightweight. AI database, KV Cache is insanely heavyweight. You're not going to hang that off of your north-south network. I mean, that's just a horrible waste of network traffic. You want to put it right into the computing fabric, which is the reason why we introduced this new tier. This is a market that never existed. This market will likely be the largest storage market in the world, basically holding the working memory of the world's AIs. That storage is going to be gigantic.

It needs to be super high performance. So I'm so happy that the amount of inference that people do has now eclipsed the computing capability that the world's infrastructure has. The amount of context memory, the amount of token memory that we process, KV Cache we process, is now just way too high. You're not going to keep up with the old storage system. When an inflection point in the market happens and you had the vision to see it happen, then this is a brand new market because of this inflection. This is the best way to go into a market. BlueField- 4, there's nothing like it, absolutely nothing like it.

Will Stein

Managing Director and Senior Analyst of Technology, Truist Securities

Great. Thanks. Will Stein from Truist Securities.

Jensen Huang

President and CEO, NVIDIA

Someday nobody's even going to ask me a question about GPUs. But don't worry. I'm going to have to throw it out there.

Will Stein

Managing Director and Senior Analyst of Technology, Truist Securities

My question's about the ramping velocity of Vera Rubin relative to the two prior generations. And in particular, as you said, you're in full production today with Vera Rubin. As that starts to get rev-rec in second half, we certainly expect Blackwell will still be going as well. Will Hopper also be still present? Will you have three architectures at once? And regardless of that question, maybe you can talk about the margin impact because it seems like your ramp for Vera Rubin should be much faster given the discussion of the time to manufacture you addressed today in the Q&A or in the presentation.

Jensen Huang

President and CEO, NVIDIA

Yeah, I appreciate that. Vera Rubin's ramp should be fast. The challenge for Vera Rubin, it is the only computer in history where literally every single chip is new. I don't think you could buy a phone where every single chip was new. Even the high-temperature capacitors were new, never existed. Even the HBM4 never existed. LPDDR5 SoCamm never existed. Are you guys following me? I'm talking about the stuff. I'm not even talking about my chips yet. Literally every single chip in that computer was brand new. The fact that we made it work at all is a miracle. The fact that we made it work perfectly is just incredible. And so that was where the risk of Vera Rubin was because there were so many new technologies coming together. You got co-packaged optics coming in on a switch, the largest switch ever made.

You've got all these different things, all this technology, and so we had the wisdom of breaking down the problem, and we were working on Vera Rubin for several years, piecing together the technology, de-risking the important parts of the technology, and de-risking important parts of the supply chain for several years now. Vera Rubin is not a one-year project. Vera Rubin probably is something close to five years, and so we take the most difficult parts and we de-risk every part of it. And sometimes we'll even mix it in with some other technology, and we've actually already shown it. It just wasn't a differentiating part. It wasn't the part that mattered, and we build these pieces. Notice we've been building BlueField- 4 for some time, but de-risk the CPU.

We need it to be, because you have so much storage, the performance per watt, the energy efficiency of the CPU has to be insanely great. You're not going to get pulled. You're not going to put up that rack of BlueFields and put your favorite x86 in there. That's just not going to happen. It's either not fast enough or you're going to draw too much power. And so we had to build a CPU that had the data rate. The perf per watt was as low as Grace. And then we built it because the SerDes, we couldn't build it that fast several years ago. We had to wait until the SerDes came along. And when everything came together, that's when your BlueField-4 was realized. So now we're on the rhythm. BlueField-5 will be easy to do. BlueField-6 will be easy to do.

But de-risking all of the technology components of Vera Rubin was just a massive undertaking. And so now everything's proven to work. Everything is in volume production. I'm just so incredibly proud of the team. I mean, when you just look at all the chips, not just our chips, but other people's chips that had to work with our chips, it's a miracle. But the problem is if you don't do that, then we literally had 1.5x more transistors and call it 1.7x more transistors. Who cares? Are you guys following me? You're not going to rack up a whole new AI factory for $50 billion for 1.5x. You're just not going to do it. The bar is too high now. Just you got to imagine that person. It's not about, oh, 50% better camera. Sure, I'll pay for that. It's 50% better AI factory.

You're not going to pay for that. Does that make sense? It's $50 billion. You're not going to do it for 50%. You'll do it for 10x. You'll write a check for 10x. But you're not going to write a check for 50%, which is the challenge because as all of you have heard me say, Moore's Law is over. Moore's Law is completely over. If every single year we're getting 50% and AI is going to we're not getting 50%, but every single year we're getting probably something like 15%-25% tops out of transistors these days. But AI is going 10x per year and token rate is going 5x per year. There's no way to keep up. And so we have to do something like extreme co-design and really basically revolutionize everything.

Srini Pajjuri

Managing Director and Senior Equity Analyst, RBC Capital Markets

Thank you. Srini Pajjuri from RBC Capital Markets.

Jensen Huang

President and CEO, NVIDIA

We're going to ramp Vera Rubin pretty fast. This is easier to ramp than this. This is literally what all of the world's computer companies are building right now. Just enormous workforces, enormous factory floors building this tray. This compute tray is the limiter of the world's AI factories today. This is what everybody's building. Look at the number of connectors. And so we realized the labor content that went into this. And then once you get here, literally this was two hours. You could just watch them. It takes two hours. Assembly of people is like an operating room standing around this thing, building it like a car. Now, you're happy to build this like a car because, of course, the economics is better than a car. This is a full car. And so you're happy to build it like a car, but you'd rather build this.

The reliability is going to be better. It's called RAS, reliability, availability, and serviceability. The one thing I didn't say today, I should have remembered. NVLink, this generation, Vera Rubin, the world's first networking system, it's hot swappable. You take part of the network, you pull it out, and you could service the rest of it while it's still running, update the software while it's running. It's craziness. The goal is just to keep that entire AI factory running all the time. Does that make sense? You just paid $50 billion for it. You don't want the concept of downtime, you would go insane. Just the amount of technology and innovation from all the learnings of working with everybody that went into Vera Rubin, all the things that I didn't say, it's a miracle.

We're going to try to ramp it as hard as we can. Second half, we should sell lots, ship lots.

Srini Pajjuri

Managing Director and Senior Equity Analyst, RBC Capital Markets

Thank you. Hi, Jensen. I'm here. I guess my question is more about the longer term. How do you see the frontier model market shaken out? If we look at the history of tech over the last 20, 25 years, there's always been one or two winners. We still have a lot of frontier models and a fairly fragmented market. At least from my usage point, they all seem pretty similar to me. So I'm just curious as to hear your thoughts about how do you see the market shaken out? Do you think AI is so different that we need so many frontier models going forward? Or if there's going to be a shakeout, what do you think will cause that?

Jensen Huang

President and CEO, NVIDIA

Yeah, good question. As soon as for example, if everybody I work with is smarter than I am, they're all smart enough. Are you guys following me? If all the AIs are smarter than we are, then they're good enough, and so that's what you're feeling. You're feeling the good enough for general use, but for domain-specific use, they're hardly good enough, and that's where I think you're going to see a fair amount of breakthroughs this year in vertical agentic systems because I think it's not easy to just boil the ocean and make every AI be great on the one hand, chemistry, and on the other hand, biology, and also drive a car and do it effectively and efficiently. I don't think you could reasonably do that. I have a fairly good understanding of the technology. I don't think you're going to reasonably do that.

The technology is related, but optimizing for all these different domains is quite different. And the specialization of the workflow, the flywheel, the data, the training, even the people to evaluate it is very vertical. And so I think long before I'm worried about the foundation model industry, I think verticalization is clearly going to happen. Now, are the verticalization going to happen outside the foundation model companies or inside the foundation models? That's a different question. However, in many of the companies that you're thinking about, remember, they already have verticals. Meta already has very deep verticals in digital marketing and AI ad serving. And so they have a lot of verticals that they're very, very good at.

It's not likely that they're going to, they need to, or it's not likely they will give up that model creation to somebody else because that vertical is too important to them, and they own the channel. They own all the expertise. You could say the same with Google, and you could say the same with X. You could say, does that make sense? I mean, there's several different companies where their go-to-market is so good because they're already domain experts in those verticals. I think that that flywheel is going to continue to sustain. In the case of OpenAI, they are the Google of our time. I use all of the AI models, but I kind of, for some reason, always go back to ChatGPT. I kind of always go back to Google. So I think they've become that.

And so that's a great outlet for them, the consumer outlet. In the case of Anthropic, their enterprise capability is really, really good. And so maybe that's their angle. I think that ultimately people, each one of these model makers, have to find an outlet, a channel that they very, very significantly secure. And otherwise, for everybody else, it's going to be open models. I just mentioned five of them that actually are fairly secure. But for everybody else, I think open models is likely to be the answer. And then they're at the frontier. They're not the frontier, but they're at the frontier. Maybe they're months behind. And we'll keep everybody there. And companies like Lilly and companies like ServiceNow, they could take it and create their own version of AI, the ServiceNow AI, the Snowflake AI. But they build it off of open models.

My sense is that that's likely the outcome for the future.

Louis Miscioscia

Executive Director, Daiwa Capital Markets

Hey, Jensen, over here, Louis Miscioscia, Daiwa Capital Markets. Thank you. So congratulations on the Mercedes launch after eight years and pointing to autonomous vehicles driving finally hitting now. Can you give us some thoughts about where the industry is for agentic and physical? You talked about it last year. You talked about it this year. And there is a lot of examples. But do you think we're going to see critical mass volume in 2026 from these areas, mainly from an inference standpoint and deployments?

Jensen Huang

President and CEO, NVIDIA

If not for agentic AI, there would be no Cursor. Cursor is agentic. If not for agentic AI, there's no Open Evidence. It's an agentic system. Almost all of the best AI native companies that you know are agentic AI companies, but we call them agentic AI today because we're trying to say that it's different than one-shot or models that don't use tools. In the future, that's just called an AI application. There's no question in my mind now this framework of AI systems is likely going to be the basic framework for building applications in the future, so instead of off-the-shelf libraries, you're going to have off-the-shelf models and off-the-shelf agentic systems. You're going to plug them together. You're going to tell them what your goals are, and they're going to try to work together.

Getting applications deployed and working together is going to be just easier and easier and easier. The technology is hard, but using it should be easier. Today, the technology is hard, and using it is hard. The reason for that is because the technology is not good enough. But as you see, that in the last two years, the AI technologies have gotten so good that using ChatGPT for research and solving a lot of questions has become so much easier. This is going to be the same with AI applications. They're all going to become agentic. Alpamayo is an agentic AV. It's agentic in the sense that it looks at the world and it goes, "I've never seen this circumstance before. What's going on?" It breaks the problem down into things that are fairly routine. It goes, "I know that.

I know that. Based on that, this is what I would recommend not do." It's a reasoning system and it's using common sense, physical common sense that we taught it. If you ask it to take you somewhere, it's a full, it's called a VLA, Vision Language Action Model. It's a full robotic system. You could just tell the car to take you somewhere. If the car is about to do something and you're not sure, you go, "Why are you doing that?" The car will just talk back to you. It's a full agentic system.

Joe Moore

Managing Director, Morgan Stanley

So, Joe Moore from Morgan Stanley. I wonder if you could talk about how to think about sizing the physical AI market. It seems intuitive that this is a very difficult problem to solve because there's simulation. There's a lot of dollars that needs to be spent. But when you did this with large language models, you had cloud CapEx budgets that just sort of shifted towards you in the early stages. For physical AI, do we need to see companies raising money? Is it going to be automotive and industrial companies? Just how do we think about what's going to fund what seems like a pretty big project ahead of us?

Jensen Huang

President and CEO, NVIDIA

Physical AI has the benefit of riding on the shoulders of the large language model. It's called multimodality, number one. It's able to understand vision and language at the same time. It's multimodality. It's aligned. Meaning that, let me see if I can give you an example. In your brain, and I'm pretty certain of it, this is true. C-A-T, those three words, meow, and the picture of a cat. To you, it's exactly aligned in the same place in your brain. It's aligned. Over the years, we've aligned. It's multimodality alignment. You use this concept called cross-attention, and you're learning two things at one time. Then your vector space becomes the same, or in the same geography. We have the benefit of large language models that were already trained.

And so we take essentially Nemotron, and then we take something that is trained very specifically on vision and world models, which uses a lot less data. In combination, we create Cosmos. So Cosmos still took tens of thousands of GPUs several years, but it's less. But now that we have Cosmos, and we give Cosmos away to everybody, that's the point. We're trying to lower the bar. You still have to do your own fine-tuning. You still have to do your own domain adaptation. And so we created Cosmos so that we could lower the bar for everybody to have physical AI. If the bar for physical AI is such that we have to do what we're doing here, replicate it three other times, one for digital biology, one for physics, obviously, it's unnecessary, and it'll take longer.

In a lot of ways, we burden it, not all of it, but probably a third of it for the world. Now they can take this and run with it.

Ananda Baruah

Senior Equity Analyst, Loop Capital

Jensen, thanks a lot for doing this. Colette, thanks a lot for doing this. Ananda Baruah with Loop Capital. A neocloud question for you. Would love to get your view on how big picture the neoclouds or the GPU clouds fit into the space structurally. NVIDIA has continued to deepen its partnerships with the neoclouds . At this point, the core customer bases from the AI labs to the hyperscalers, sovereign, and now even enterprise are moving closer to embracing the neoclouds . So we'd love to get your take on the role you see them playing big picture. And then just as a quick Part B, to the extent that enterprise and sovereign adopt neoclouds , does that help NVIDIA sell their enterprise OS? Thanks.

Jensen Huang

President and CEO, NVIDIA

We believed in the existence of this category because the technology is still changing fairly fast. When we started cultivating what people called neoclouds , or at the time, they were just called GPU clouds because that's all they had in their clouds, we started cultivating them and partnering with them because we realized that the AI technology was moving fairly quickly, and we knew that our technology was going to move fairly quickly. So the market was moving quickly. The technology is moving quickly. And the access to land, power, and shell was also not something you could take for granted. And so in this combination of circumstances, we felt that a community of fast-moving, agile regional players would likely be quite successful. We were right. And so nScale is a European region. You have Yotta in India. They're a region, regional player.

And of course, there's CoreWeave that we know about, and Lambda that people know about. And so we have all these different regional players, G42 in the Middle East. And so there's a lot of different regions. My sense is that you're going to find more. And each one of these geographies are going to build up their AI infrastructure. And that AI infrastructure and community is going to have to include everything from researchers and startups, and isn't that right, and large companies. And oftentimes, being where your customer is makes a difference. And so that's the reason why we did it. And now it's given us a network of outlets to the marketplace and partners who are constantly seeking out land, power, and shell opportunities, speaking back to the power challenges, but customers and partners who are trying to build this ecosystem with us.

And so we're quite informed as a result of all that.

Chris Caso

Managing Director and Senior Equity Analyst, Wolfe Research

Thank you. It's Chris Caso from Wolfe Research. The question is about margins and really sustainability of margins at these high levels. It's one of the most frequent questions we get from investors. I know you've answered the question for this year that they stay in the mid-70s%. But two questions on that. One is, what's different with the Rubin ramp that allows you to maintain those margins in the early stage of the ramp? Because typically, your margins will come down in the early stages. And then longer term, how does NVIDIA continue to price to value in the face of competition? How do you maintain these high margins?

Jensen Huang

President and CEO, NVIDIA

So, for the ramp questions, it's just whether it's in the plus or minus 0.5%, that's kind of obviously hard to predict, nor is that your question. In the long term, our margins are exactly directly related to the value that we deliver. And I simplify the world of value creation down to basically three charts. And those three charts are insanely hard to get done right. The first one is, how many GPUs does it take for you to train a reasonable-sized model in the time that you need to train it? And so you've got to get to market every year. And you need to iterate on that model several times. And you like not to win the front. You like not to lose the frontier. And so your training capability matters a lot.

And that's not just a flops thing because the shape of that curve, the shape of that curve that I showed you, is directly related to co-design. It's got networking problems. It's got memory bandwidth problems. It's got NVLink problems. It's got software problems. It's got every problem. The shape of that curve, otherwise, it would just be. Have you noticed I'm the only CEO that shows you guys' shape of curves instead of just a bar? Do you guys know the bar? You just pick a point. It's like, that's not life. Life is not like a bar. And so life is this. And I can't show you the multidimensional curves that we have. And we call them frontier Paretos, Pareto frontiers. And we sweep across a multidimensional space. And that's the simplest version that we and it's the one that makes sense most to deliver value.

And so there's the training side of it. There's the inference side of it. And then there's the throughput side of it. The inference side of it has to do with the cost it takes to generate the tokens. Given the latency, the quality of service requirements of that time, people are expecting more and more tokens generated, which means that the token rate has to be much shorter. And yet, you need to deliver it much more cost-effectively. That multidimensional problem was described in that chart. And then the next one is, in the final analysis, if you're going to build your infrastructure this year, you better be darn sure that your revenues will go up next year. And you've got only 3 GW or 2 GW or 1 GW. That's it.

So, within your 1 GW, your perf per dollar, your perf per watt, that last chart that I showed, it was a second or a third. That chart is insanely hard. That's not a bar because the workload differs across all these different scenarios. And so your job is to, one, support your research team to get to the frontier. Two, once you have a model, you better make money at it because otherwise, it has to generate tokens fast enough for a good quality of service. And then three, the entire data center has to generate enough revenues that the CapEx you put down next year is going to cause your company to grow, not shrink. And so these are such complicated issues. And I distilled it into one chart, but there's mounds of simulators behind it.

To the extent that we can continue to do that at the levels that we're doing that, it's easy to go 1.5x. Actually, I take that back. That's not even that easy. If you build a chip that's 1.5x more transistors, you put it into that supercomputer, you don't get 1.5x. You'll get 1.2x because of Amdahl's Law. So you have to do extreme engineering, extreme co-design across all of this and invent things like NVFP4 to bust out of Moore's Law. Are you guys following me? Otherwise, you're like everybody else. So that's the challenge. I think our ability to sustain value, which is deliver the value across the dimensions that I've said, is fairly. I wouldn't say it's singular, but I would say it's fairly close to singular. We're doing it at this scale. It's getting harder every time.

As you know, it obviously gets harder every time. And so I think so those are the dimensions. You take the performance. In the end, it's about that 1 GW data center. That's the simple math. It's $50 billion. Does your $50 billion deliver more revenues than somebody else? Do you deliver more tokens than somebody else? It's $50 billion if somebody else is willing to do it for $40 billion because $20 billion of it is just land, power, and shell. Isn't that right? Chips are free. It's still $20 billion. If everything was free, it was still $20 billion. Are you guys following me? And so you better be sure that whatever you put on top of that $20 billion, you're going to be happy with. And so that's why I think the difference between good margins and poor margins is such a small difference across that $50 billion.

So long as we continue to deliver the throughput, I hope that customers will continue to reward us for that. But that's our basic strategy, and then today, I spoke about the new dimensions of that data center, which is we really want to invest in making sure that the resilience, reliability, availability, and serviceability of the data center is world-class, and that's very, very hard as well.

Speaker 18

Thanks. So Jensen, following up on the previous question on the AI models, what do you think might be the game changer for the current competitive landscape? Because since now, it seems we are in an oligopoly where three to five big players dominate the market in different verticals. But any game changer in the next several years, and would that be? And my second question is, seeing the AI token is growing five times year over year, and when would that re-accelerate or slow down? And what would be the catalyst for the change? And my final question, what's the next milestone for both AI and NVIDIA that you're looking for that you are super excited about? Thank you.

Jensen Huang

President and CEO, NVIDIA

If I told you the last one, you're not going to come to the next one. Some of the big breakthroughs that are still right around the corner, how to deal with memory is a big deal, and today, context lengths could be 100,000. It could be a million, but clearly, the right context length is infinite, and so how do you deal with infinite context length where you have to do attention across all of that? So there's some good research that needs to be done there. We're playing with a lot of research in this area ourselves. That's why our model is hybrid. State space models, the SSMs, are highly efficient at compressing context versus transformers, and so we've created a hybrid version, and so Nemotron 3 was very efficient at context processing and token generation. I think there's a lot of research in this direction still.

I think some of it is related to the system architecture, as I mentioned today, BlueField-4, to bring memory closer. It's kind of like your long-term memory doesn't sit in your head. It sits on the network somewhere. We just took your long-term memory and we put it in your brain. We just put BlueField-4 in the same rack. And so we're going to put your long-term memory in your brain. I think it makes a lot of sense. There's a lot of continuous learning type of things, which is you come up to a circumstance you don't know before. You reason about it, and you still don't know how to solve it. Maybe it's a domain. You just don't even have first principle knowledge of it to even break it down. And so you might know that you don't know it.

To know that you don't know something and to go study up, go do research so that you have some first principle knowledge of something, and then you come back and you reason about it. So it's kind of almost like the AI, first you prompt it, and it doesn't know the answer at all. But it's okay. Behind your back, it just went choop, choop. It went and did some research and go, I learned it. Now it could reason about your question. Does it make sense? And so continuous learning doesn't have to. I think there's a fair amount of research opportunity there. And your second question, what's going to be the next inflection of token rate generation? The 5x came about because of reasoning. I have a feeling that we have a 50x in front of us.

The 50x in front of us is because of agentic systems. The reasoning is going to come with it, tool use and planning and simulation. It's almost like AlphaGo in real time. So I think that's probably around the corner. These agentic systems are likely to cause token rates to go way up. The demand for computing is really, really high right now, as you guys know. Because of the three scaling laws, reasoning and agentic systems are putting enormous amounts of pressure. Multimodality, larger models are putting an enormous amount of pressure on the training systems. On the inference systems, because of long thinking, putting an enormous amount of pressure. Then all of a sudden, agentic systems come along, putting an enormous amount of pressure on the. Then because these models are now so useful, the number of AI startups is going through the roof.

If you look at the number of AI startups from 2025 versus the year before, it almost doubled, and the amount of money they raised, most of it is going to go towards compute so that you can be an AI company, and so the amount of burden, the pressure on the computing companies is really high. That's why our customer demand is so high.

Ruben Roy

Managing Director, Stifel

Jensen, hi. Ruben Roy from Stifel. You've talked about extreme co-design quite a bit. And the stats are staggering, right? The Spectrum-X run rate, getting to 400 gig SerDes. What does that say about R&D intensity going forward? And are you seeing tangible benefits today from some of the things you're doing with either the EDA companies or Cursor? Is that impacting your R&D? And then I guess the last part of that question is just M&A philosophy. And just thinking through some of these recent deals, Groq and Fabrica, and how you're thinking about M&A relative to getting and accelerating the design process. Thanks.

Jensen Huang

President and CEO, NVIDIA

If you look at all of our investments, you could see us investing at almost every layer of the stack, from land, power, shell, chips, infrastructure, models, and applications. We either invest in it directly, meaning we're building it like, for example, NVIDIA's Open Models. We're one of the largest model builders in the world. We just don't ever talk about it. And now we're getting so much attention in this area, positive attention in this area. You'll hear us talking about it more. And it also explains why NVIDIA's cloud spend is so high. Because we're building the world's leading models for open science and all these open markets. But that creates such an incredible flywheel for our architecture. It's one of the best investments we have. And so you see us invest across that entire stack that way. You then see us invest across the industry.

For example, we invest from digital biology to agentic systems all the way to robotics and autonomous vehicles. And so we'll invest across the industries. And then we'll invest across the dimension up and down the supply chain. So when you see us invest, and I conflate R&D budget with invest, because to me, it's the same. It's improper that most people split the two. Investment is investment. What difference does it make? You're investing internally or externally to enhance your company's market position. Isn't that right? That's our fundamental goal. That's what all of you would like me to do. And so I see this large investment portfolio in this larger universe in that simple way. Where we could do something very uniquely, we'll likely do it internally and bulk it up. For example, NVLink is clearly revolutionary.

And we're singularly the only company in the world with it today. The first version of it is likely going to be scale-out Ethernet, called scale-up Ethernet. And it'll work out the same way. But here, it's such an incredible capability. And the rhythm of our company is so fast that it made sense for us to bring like-minded people to our company. And this is where Fabrica came along. Roshan and the team were always thinking about scale-up. And they just described it in different ways. And their market traction was difficult because it's hard to build a company that does scale-up. The compute and the fabric are highly integrated. The software stack is completely integrated. And so it's hard to disentangle them into different companies. Some of the things that I'm thinking about with Groq, it's hard to be disentangled.

There's a lot of innovation and invention that has to happen at the two teams level. And so there, again, I think the teams ought to just join us in that particular case. But to the extent that we can invest outside, I prefer that. I prefer to keep NVIDIA as small as possible, as large as necessary. And if you look at our company today, we're what, 40,000 people? We're probably one of the leanest fighting machines on the planet. And yet our ecosystem is really sprawling. And that's the mental model of a company of the future. And so I think the level of R&D spend of our company is quite sustainable.

Natalia Winkler

Director and Semiconductors Equity Research Analyst, UBS

Jensen, hi. Natalia Winkler, UBS. One question I want to follow up on Groq. I was wondering if there's any software technology from that deal that you could leverage across the NVIDIA portfolio broadly.

Jensen Huang

President and CEO, NVIDIA

Yeah, no doubt. Because the programming model of the Groq chips is very different than the programming models of our chips, which is the reason why they're extremely low latency and we're extremely high throughput. Some of the things that I'm thinking about doing would rely on the reshaping of some of that stuff. But we're just really at the beginning part of it. We have plenty of time to go do this. And if we succeed in doing it, it'd just be yet another dimension of capability we can bring to the world's AI factories. I think the vast majority would still just be things like Grace Blackwell and Vera Rubin, just call it 90%. But maybe in the future, 10% of it is an extreme version of something. It's kind of like it's okay to build a mid-engine and a SUV. It's okay. It's still both Ferraris, right?

And so I think what NVIDIA does is not build GPUs. What NVIDIA does is build AI infrastructure. Are you guys following me? I barely showed you a GPU today. I stood in front of a pod. And so our goal is to build AI infrastructure that is insanely amazing, not just for 80% of the world, or 87%, but hopefully for 100% of the world. And even use cases you didn't even know about yet.

Okay. I believe we have time for one last question.

Speaker 19

Yeah, hi, Jensen. Thank you for taking my question, Ken, from BMO Capital Markets. So it's a question about both margin and technology. So actually, you currently already have CPX technology and through the acquisition of Groq, you also can have the access to SRAM that can be used in inferencing. So actually, your team also published a paper a month ago about how to use CPX in prefill so as to reduce the usage of HBM because you can use GDDR7 instead of HBM. And we all know that HBM is very expensive. So going forward, because of the combination of Groq and your in-house CPX technology, how would you see your usage of HBM? Can it be more under control so that could be more positive to your margin going forward? Thanks a lot.

Jensen Huang

President and CEO, NVIDIA

Sure, so I can describe the benefits of each one of these things, and then I'll describe the challenge, why it's not so obvious. For example, CPX does prefill per dollar better than for example, Rubin CPX versus just Rubin. It does prefill for a dollar better, so it's prefill per dollar is higher than Vera Rubin normally, and if I keep everything on SRAM, then of course, I don't need HBM memory. But the problem is then the size of my model that I can keep inside these SRAMs is like 100 times smaller. However, for some workloads, it could be insanely fast because SRAM's a lot faster than going off to even HBM memories, and so I think you could kind of see the benefits in prefill and decode. The problem is workloads are changing shape all the time, and sometimes you have MOEs.

Sometimes you have multimodality stuff. Sometimes you got diffusion models, and so sometimes you have autoregressive models. Sometimes you have SSMs. These models are all slightly different in shape and size, and sometimes they move the pressure on the NVLink. Sometimes they move the pressure on HBM memory. Sometimes they move the pressure on all three. And so my point is because the workloads are changing so fast and because the world is innovating so fast, that's one of the reasons why NVIDIA is just universally the right answer, because we're flexible. Does that make sense? If your workload is changing from morning to night and depending on what customer you have, we're versatile. We're good at almost everything, and you might be able to take one particular workload and push it to the extreme.

But that 10% of the workload or even 5% of the workload or 12% of the workload, if it's not being used, then all of a sudden, that part of the data center could have been used for something else for 90% of the workload, and you deprived it because you only have 1 GW. The trick is to think through that one data center, not as infinite money and infinite space, but you have finite power. And so you have to utilize that finite power for the overall consumption of the data center. And the more flexible it is, the better it is. The more unified architecture it is. For example, if we updated a new DeepSeek model and every single GPU in the data center, all of a sudden, every one of their performance go up.

Uploaded, updated the library for a Kimi, Qwen model, and the whole data center goes up. I do it. Does that make sense? But if you have 17 different architectures, one is good for this thing, one's good for that thing, then as it turns out, the overall TCO is not as good. So that's the challenge. And even when I'm building these things, I know what the challenge is. It's very hard. I'm exploring ways to beat myself all the time. And it's hard. We're exploring all kinds of different chips to try to build a better solution than even the one we have. It is extremely hard. But the exercise is worth it. Isn't that right?

If I'm constantly trying to come up with a new way to even do better than what I currently have, if I'm doing that myself, then I'm exploring all of the nooks and crannies. I'm trying to disrupt myself all the time. So I think that at the moment, CPX is exactly as you say, but it also reduces the flexibility of the data center. And I think with that, let's see. What should I tell you guys? One, demand is really high. If I didn't say that earlier, demand is really high. Two, Grace Blackwell, GB300, the transition has been wonderful. And everybody is building it out. Vera Rubin, next generation, is here. Another giant step up. Couldn't have been possible without co-design. Took us all these different chips to make it possible. Second half of this year, we'll start shipping.

And next year, it will be the Vera Rubin year. And we'll be shipping at scale. In the meantime, I think that you should know that the AI community is doing really, really good work. ChatGPT was followed on by o1, which is completely revolutionary, which triggered reasoning models of all kinds. Open models is now the second largest model in the world. That's probably the best way to think about it. And our company has been doing this for some time. But we are a giant model builder as well. And we build models and do software that we share with the entire community. And I think this year, agentic AI and physical AI are really going to hit strides. And so have a great year. I wish I had voice for you, but I don't know where I left it, but I will find it.

All right, you guys. Happy New Year.