CEO Keynote COMPUTEX 2024

Jun 2, 2024

Slides

Speaker 2

Please welcome to the stage, NVIDIA founder and CEO, Jensen Huang.

Jensen Huang

CEO, NVIDIA

[Foreign Language] I am very happy to be back. Thank you, NTU, for letting us use your stadium. The last time I was here, I received a degree from NTU. I gave the run, don't walk speech. Today, we have a lot to cover, so I cannot walk, I must run. We have a lot to cover. I have many things to tell you. I'm very happy to be here in Taiwan. Taiwan is the home of our treasured partners. This is, in fact, where everything NVIDIA does begins. Our partners and ourselves take it to the world. Taiwan and our partnership has created the world's AI infrastructure. Today, I want to talk to you about several things. One, what is happening and the meaning of the work that we do together? What is generative AI?

What is its impact on our industry and on every industry? A blueprint for how we will go forward and engage this incredible opportunity, and what's coming next. Generative AI and its impact, our blueprint, and what comes next. These are really, really exciting times. A restart of our computer industry, an industry that you have forged, an industry that you have created, and now you're prepared for the next major journey. But before we start, NVIDIA lives at the intersection of computer graphics, simulations, and artificial intelligence. This is our soul. Everything that I show you today is simulation. It's math, it's science, it's computer science, it's amazing computer architecture. None of it's animated, and it's all homemade. This is NVIDIA's soul, and we put it all into this virtual world we call Omniverse. Please enjoy.

[Foreign Language] I want to speak to you in Chinese, but I have so much to tell you, I have to think too hard to speak Chinese, so I have to speak to you in English. At the foundation of everything that you saw was two fundamental technologies: accelerated computing and artificial intelligence, running inside the Omniverse. Those two technologies, those two fundamental forces of computing, are going to reshape the computer industry. The computer industry is now some 60 years old. In a lot of ways, everything that we do today was invented the year after my birth, in 1964. The IBM System/360 introduced central processing units, general purpose computing, the separation of hardware and software through an operating system, multitasking, I/O subsystems, DMA, all kinds of technologies that we use today, architectural compatibility, backwards compatibility, family compatibility.

All of the things that we know today about computing, largely described in 1964. Of course, the PC revolution democratized computing and put it in the hands and the houses of everybody. And then also in 2007, the iPhone introduced mobile computing and put the computer in our pocket. Ever since, everything is connected and running all the time through the mobile cloud. This last 60 years, we saw several, just several, not that many, actually, two or three major technology shifts. Two or three tectonic shifts in computing, where everything changed, and we're about to see that happen again. There are two fundamental things that are happening.

The first is that the processor, the engine by which the computer industry runs on, the central processing unit, the performance scaling has slowed tremendously, and yet, the amount of computation we have to do is still doubling very quickly, exponentially. If processing requirement, if the data that we need to process continues to scale exponentially, but performance does not, we will experience computation inflation, and in fact, we're seeing that right now as we speak. The amount of data center power that's used all over the world is growing quite substantially. The cost of computing is growing. We are seeing computation inflation. This, of course, cannot continue. The data is gonna continue to increase exponentially, and CPU performance scaling will never return. There is a better way. For almost two decades now, we've been working on accelerated computing.

CUDA augments a CPU, offloads, and accelerates the work that a specialized processor can do much, much better. In fact, the performance is so extraordinary that it is very clear now, as CPU scaling has slowed and even substantially stopped, we should accelerate everything. I predict that every application that is processing-intensive will be accelerated, and surely, every data center will be accelerated in the near future. Now, accelerated computing is very sensible. It's very common sense. If you take a look at an application, and here, the 100 T means 100 units of time. It could be 100 seconds, it could be 100 hours, and in many cases, as you know, we're now working on artificial intelligence applications that run for 100 days. The 1 T is code that requires sequential processing, where single-threaded CPUs are really quite essential.

Operating systems, control logic, really essential to have one instruction executed after another instruction. However, there are many algorithms, computer graphics is one, that you can operate completely in parallel. Computer graphics, image processing, physics simulations, combinatorial optimizations, graph processing, database processing, and of course, the very famous linear algebra of deep learning. There are many types of algorithms that are very conducive to acceleration through parallel processing. So we invented an architecture to do, to do that. By adding the GPU to the CPU, the specialized processor can take something that takes a great deal of time and accelerate it down to something that is incredibly fast. And because the two processors can work side by side, they're both autonomous and they're both separate, independent, that is, we could accelerate what used to take 100 units of time down to one unit of time.

Well, the speed up is incredible. It almost sounds unbelievable. It almost sounds unbelievable, but today, I'll demonstrate many examples for you. The benefit is quite extraordinary. 100x speed up, but you only increase the power by about a factor of 3, and you increase the cost by only about 50%. We do this all the time in the PC industry. We add a GPU, a $500 GPU, GeForce GPU, to a $1,000 PC, and the performance increases tremendously. We do this in a data center. A $1 billion data center, we add $500 million worth of GPUs, and all of a sudden, it becomes an AI factory. This is happening all over the world today. Well, the savings are quite extraordinary. You're getting 60 x performance per dollar. 100 x speed up, you only increase your power by 3x.

100 x speed up, you only increase your cost by 1.5x. The savings are incredible. The savings are measured in dollars. It is very clear that many, many companies spend hundreds of millions of dollars processing data in the cloud. If it was accelerated, it is not unexpected that you could save hundreds of millions of dollars. Now, why is that? Well, the reason for that is very clear. We've been experiencing inflation for so long in general-purpose computing. Now that we finally came to the we're finally determined to accelerate, there's an enormous amount of captured loss that we can now regain. A great deal of captured, retained waste that we can now relieve out of the system, and that will translate into savings. Savings in money, savings in energy.

And that's the reason why you've heard me say, "The more you buy, the more you save." And now I've shown you the mathematics. It is not accurate, but it is correct. Okay, that's called CEO math. CEO math is not accurate, but it is correct. The more you buy, the more you save. Well, accelerated computing does deliver extraordinary results, but it is not easy. Why is it that it saves so much money, but people haven't done it for so long? The reason for that is because it's incredibly hard. There is no such thing as a software that you can just run through a C compiler, and all of a sudden, that application runs 100 x faster. That is not even logical. If it was possible to do that, they would have just changed the CPU to do that.

You, in fact, have to rewrite the software. That's the hard part. The software has to be completely rewritten so that you could refactor, re-express the algorithms that was written on a CPU so that it could be accelerated, offloaded, accelerated, and run in parallel. That computer science exercise is insanely hard. Well, we've made it easy for the world over the last 20 years. Of course, the very famous cuDNN, the deep learning library that processes the neural networks. We have a library for AI physics that you could use for fluid dynamics and many other applications, where the neural network has to obey the laws of physics. We have a great new library called Aerial, that is a CUDA-accelerated 5G radio, so that we can software-define and accelerate the telecommunications networks the way that we've software-defined the world's networking, internet.

And so the ability for us to accelerate that allows us to turn all of telecom into essentially the same type of platform, a computing platform, just like we have in the cloud, cuLitho is a computational lithography platform that allows us to process the most computationally intensive parts of chip manufacturing, making the mask. TSMC is in the process of going to production with cuLitho, saving enormous amounts of energy and more enormous amounts of money. But the goal for TSMC is to accelerate their stack so that they're prepared for even further advances in algorithm and more computation for deeper and deeper, narrower and narrower transistors. Parabricks is our gene sequencing library. It is the highest throughput library in the world for gene sequencing, cuOpt is an incredible library for combinatorial optimization. Route planning optimization, the traveling salesman problem, incredibly complicated.

People just p eople have, well, scientists have largely concluded that you needed a quantum computer to do that. We created an algorithm that runs on accelerated computing that runs lightning fast. 23 world records, we hold every single major world record today, cuQuantum is an emulation system for a quantum computer. If you want to design a quantum computer, you need a simulator to do so. If you want to design quantum algorithms, you need a quantum emulator to do so. How would you do that? How would you design these quantum computers, create these quantum algorithms, if the quantum computer doesn't exist? Well, you use the fastest computer in the world that exists today, and we call it, of course, NVIDIA CUDA. And on that, we have an emulator that simulates quantum computers. It is used by several hundred thousand researchers around the world.

It is integrated into all the leading frameworks for quantum computing. And it's used in scientific supercomputing centers all over the world, cuDF is an unbelievable library for data processing. Data processing consumes the vast majority of cloud spend today. All of it should be accelerated, cuDF accelerates the major libraries used in the world. Spark, many of you probably use Spark in your companies, pandas, a new one called Polars, and of course, NetworkX, which is a graph processing database library. And so these are just some examples. There are so many more. Each one of them had to be created so that we can enable the ecosystem to take advantage of accelerated computing.

If we hadn't created cuDNN, CUDA alone wouldn't have been possible for all of the deep learning scientists around the world to use. Because CUDA and the algorithms that are used in TensorFlow and PyTorch, the deep learning algorithms, the separation is too far apart. It's almost like trying to do computer graphics without OpenGL. It's almost like doing data processing without SQL. These domain-specific libraries are really the treasure of our company. We have 350 of them. These libraries is what it takes, and what has made it possible, for us to have such opened so many markets. I'll show you some other examples today. Well, just last week, Google announced that they've put cuDF in the cloud and accelerate Pandas. Pandas is the most popular data science library in the world. Many of you in here probably already use pandas.

It's used by 10 million data scientists in the world, downloaded 170 million times each month. It is the Excel, it is the spreadsheet of data scientists. Well, with just one click, you can now use pandas in Colab, which is Google's cloud data centers platform, accelerated by cuDF. The speed up is really incredible. Let's take a look. That was a great demo, right? It didn't take long. When you accelerate data processing that fast, demos don't take long. Okay. Well, CUDA has now achieved what people call a tipping point, but it's even better than that. CUDA has now achieved a virtuous cycle. This rarely happens. If you look at history and all the computing architectures, computing platforms, in the case of microprocessors, CPUs, it has been here for 60 years. It has not been changed for 60 years.

At this level, this way of doing computing, accelerated computing, has been around. Creating a new platform is extremely hard because it's a chicken and egg problem. If there are no developers that use your platform, then, of course, there will be no users. But if there are no users, there are no install base. If there are no install base, developers aren't interested in it. Developers want to write software for a large install base, but a large install base requires a lot of applications so that users would create that install base. This chicken or the egg problem has rarely been broken, and it's taken us now 20 years, one domain library after another, one acceleration library after another, and now we have five million developers around the world.

We serve every single industry, from healthcare, financial services, of course, the computer industry, automotive industry, just about every major industry in the world, just about every field of science. Because there are so many customers for our architecture, OEMs and cloud service providers are interested in building our systems. System makers, amazing system makers, like the ones here in Taiwan, are interested in building our systems, which then takes and offers more systems to the market, which, of course, creates greater opportunity for us, which allows us to increase our scale, R&D scale, which speeds up the application even more. Well, every single time we speed up the application, the cost of computing goes down. This is that slide I was showing you earlier. 100x speed up translates to 97%, 96%, 98% savings.

And so when we go from 100x speed up to 200x speed up to 1,000x speed up, the savings, the marginal cost of computing continues to fall. Well, of course, we believe that by reducing the cost of computing incredibly, the market, developers, scientists, inventors, will continue to discover new algorithms that consume more and more and more computing, so that one day, something happens, that a phase shift happens, that the marginal cost of computing is so low that a new way of using computers emerge. In fact, that's what we're seeing now. Over the years, we have driven down the marginal cost of computing in the last 10 years in one particular algorithm by a million times.

Well, as a result, it is now very logical and very common sense to train large language models with all of the data on the internet. Nobody thinks twice. This idea that you could create a computer that could process so much data to write its own software, the emergence of artificial intelligence, was made possible because of this complete belief that if we made computing cheaper and cheaper and cheaper, somebody's gonna find a great use. Well, today, CUDA has achieved a virtuous cycle. Installed base is growing, computing cost is coming down, which causes more developers to come up with more ideas, which drives more demand and now we're on the in the beginning of something very, very important. But before I show you that, I wanna show you what is not possible.

If not for the fact that we created CUDA, that we created the modern version of generative, the modern big bang of AI, generative AI, what I'm about to show you would not be possible. This is Earth-2. The idea that we would create a digital twin of the earth, that we would go and simulate the earth, so that we could predict the future of our planet, to better avert disasters or better understand the impact of climate change, so that we can adapt better, so that we could change our habits now. This digital twin of earth is probably one of the most ambitious projects that the world's ever undertaken, and we're taking steps, large steps every single year, and I'll show you results every single year. But this year, we made some great breakthroughs. Let's take a look.

Speaker 7

On Monday, the storm will veer north again and approach Taiwan. There are big uncertainties regarding its path. Different paths will have different levels of impact on Taiwan.

Speaker 8

[Foreign Language]

Jensen Huang

CEO, NVIDIA

Someday, in the near future, we will have continuous weather prediction at every square kilometer on the planet. You will always know what the climate's gonna be. You will always know, and this will run continuously because we've trained the AI, and the AI requires so little energy. And so this is just an incredible achievement. I hope you enjoyed it. And very importantly, [Foreign Language]? The truth is, that was a Jensen AI. That was not me. I wrote it, but an AI, Jensen AI, had to say it. [Foreign language]. CUDA. Because of our dedication to continuously improve the performance of drive the cost down, researchers discovered AI researchers discovered CUDA in 2012.

That was NVIDIA's first contact with AI. This was a very important day. We had the good wisdom to work with the, the scientists to make it possible for deep learning to happen, and AlexNet achieved, of course, a tremendous computer vision breakthrough. But the great wisdom was to take a step back and understanding what was the background? What is the foundation of deep learning? What is its long-term impact? What is its potential? And we realized that this technology has great potential to scale. An algorithm that was invented and discovered decades ago, all of a sudden, because of more data, larger networks, and very importantly, a lot more compute all of a sudden, deep learning was able to us, achieve what no human algorithm was able to.

Now, imagine if we were to scale up the architecture even more, larger networks, more data, and more compute, what could be possible? So we dedicated ourselves to reinvent everything. After 2012, we changed the architecture of our GPU to add tensor cores. We invented NVLink. That was 10 years ago now, cuDNN, TensorRT, NCCL. We bought Mellanox, TensorRT LLM, the Triton Inference Server, and ll of it came together on a brand-new computer nobody understood. Nobody asked for it, nobody understood it, and in fact, I was certain nobody wanted to buy it. And so we announced it at GTC, and OpenAI, a small company in San Francisco, saw it, and they asked me to deliver one to them. I delivered the first DGX, the world's first AI supercomputer, to OpenAI in 2016. Well, after that, we continued to scale.

From one AI supercomputer, one AI appliance, we scaled it up to large supercomputers, even larger. By 2017, the world discovered transformers, so that we could train enormous amounts of data and recognize and learn patterns that are sequential over large spans of time. It is now possible for us to train these large language models to understand and achieve a breakthrough in natural language understanding. And we kept going after that. We built even larger ones, and then in November 2022, trained on thousands, tens of thousands of NVIDIA GPUs in a very large AI supercomputer, OpenAI announced ChatGPT. 1 million users after five days. 1 million after five days, 100 million after two months. The fastest-growing application in history, and the reason for that is very simple: It is just so easy to use, and it was so magical to use.

To be able to interact with a computer like it's human, instead of being clear about what you want. It's like the computer understands your meaning. It understands your intention. Oh, I think here it asked the closest night market. Night, as you know, the night market is very important to me. So when I was young, I was. I think I was four and a half years old. I used to love going to the night market because I just loved watching people. And so we went. My parents used to take us to the night market. Yuan Huan. Yuan Huan. And I love going, and one day, my face, you guys might see that I have a large scar on my face.

My face was cut because somebody was washing their knife, and I was a little kid. But my memories of the night market is so deep because of that, and I used to love, I still love going to the night market. And I just need to tell you guys this, the Tonghua Night Market is really good because there's a lady, she's been working there for 43 years. She's the fruit lady, and it's in the middle between the two. Go find her, okay? She's really terrific. I think it would be funny after this, all of you go to see her. Every year, she's doing better and better, and her cart has improved, and yeah, I just love watching her succeed.

Anyways, ChatGPT came along, and something is very important in this slide. Here, let me show you something. This slide, okay, and this slide. The fundamental difference is this: Until ChatGPT revealed it to the world, AI was all about perception, natural language understanding, computer vision, speech recognition. It's all about perception and detection. This was the first time the world saw a generative AI. It produced tokens, one token at a time, and those tokens were words. Some of the tokens, of course, could now be images or charts or tables, songs, words, speech, videos. Those tokens could be anything. Anything that you can learn the meaning of. It could be tokens of chemicals, tokens of proteins, genes. You saw earlier in Earth-2, we were generating tokens of the weather. We can learn physics.

If you can learn physics, you could teach an AI model physics. The AI model can learn the meaning of physics, and it can generate physics. We were scaling down to one kilometer, not by using filtering. It was generating. And so we can use this method to generate tokens for almost anything, almost anything of value. We can generate steering wheel control for a car. We can generate articulation for a robotic arm. Everything that we can learn, we can now generate. We have now arrived, not at the AI era, but a generative AI era. But what's really important is this: this computer that started out as a supercomputer has now evolved into a data center, and it produces one thing. It produces tokens. It's an AI factory. This AI factory is generating, creating, producing something of great value, a new commodity.

In the late 1890s, Nikola Tesla invented an AC generator. We invented an AI generator. The AC generator generated electrons. NVIDIA's AI generator generates tokens. Both of these things have large market opportunities. It's completely fungible in almost every industry, and that's why it's a new industrial revolution. We have now a new factory producing a new commodity for every industry that is of extraordinary value, and the methodology for doing this is quite scalable, and the methodology of doing this is quite repeatable. Notice how quickly so many different AI models, generative AI models, are being invented, literally daily. Every single industry is now piling on. For the very first time, the IT industry, which is $3 trillion... $3 trillion IT industry, is about to create something that can directly serve $100 trillion of industry.

No longer just an instrument for information storage or data processing, but a factory for generating intelligence for every industry. This is going to be a manufacturing industry, not a manufacturing industry of computers, but using the computers in manufacturing. This has never happened before. Quite an extraordinary thing. What started with accelerated computing, led to AI, led to generative AI, and now an industrial revolution. Now, the impact to our industry is also quite significant. Of course, we could create a new commodity, a new product we call tokens for many industries, but the impact to ours is also quite profound. For the very first time, as I was saying earlier, in 60 years, every single layer of computing has been changed. From CPUs, general purpose computing, to accelerated GPU computing, where the computer needs instructions, now computers process LLMs, large language models, AI models.

Whereas the computing model of the past is retrieval-based, almost every time you touch your phone, some pre-recorded text or pre-recorded image or pre-recorded video is retrieved for you and recomposed based on a recommender system to present it to you based on your habits. But in the future, your computer will generate as much as possible, retrieve only what's necessary. And the reason for that is because generated, generated data requires less energy to go fetch information. Generated data also is more contextually relevant. It will encode knowledge, it will code your understanding of you, and instead of, "Get that information for me," or, "Get that file for me," you just say, "Ask me for an answer." And instead of a tool, instead of your computer being a tool that we use, the computer will now generate skills. It performs tasks.

Instead of an industry that is producing software, which was a revolutionary idea in the early nineties. Remember, the idea that Microsoft created for packaging software revolutionized the PC industry. Without packaged software, what would we use the PC to do? It drove this industry, and now we have a new factory, a new computer, and what we will run on top of this is a new type of software, and we call it NIMs, NVIDIA Inference Microservices. Now, what happens is the NIM runs inside this factory, and this NIM is a pre-trained model. It's an AI. Well, this AI is, of course, quite complex in itself, but the computing stack that runs AIs are insanely complex. When you go and use ChatGPT, underneath their stack is a whole bunch of software.

Underneath that prompt is a ton of software, and it's incredibly complex because the models are large, billions to trillions of parameters. It doesn't run on just one computer, it runs on multiple computers. It has to distribute the workload across multiple GPUs, tensor parallelism, pipeline parallelism, data parallel, all kinds of parallelism. Expert parallelism, all kinds of parallelism. Distributing the workload across multiple GPUs, processing it as fast as possible, because if you are in a factory, if you run a factory, your throughput directly correlates to your revenues. Your throughput directly correlates to quality of service, and your throughput directly correlates to the number of people who can use your service. We are now in a world where data center throughput utilization is vitally important. It was important in the past, but not vitally important. It was important in the past, but people don't measure it.

Today, every parameter is measured: start time, uptime, utilization, throughput, idle time, you name it, because it's a factory. When something is a factory, its operations directly correlate to the financial performance of the company. And so we realized that this is incredibly complex for most companies to do. So what we did was we created this AI in a box, and it contains an incredible amount of software. Inside this container is CUDA, cuDNN, TensorRT, Triton for inference services. It is cloud native so that you could auto-scale in a Kubernetes environment. It has management services and hooks so that you can monitor your AIs. It has common APIs, standard APIs, so that you could literally chat with this box. You download this NIM, and you can talk to it. So long as you have CUDA on your computer, which is now, of course, everywhere.

It's in every cloud, available from every computer maker. It is available in hundreds of millions of PCs. When you download this, you have an AI, and you can chat with it like ChatGPT. All of the software is now integrated. 400 dependencies all integrated into one. We tested this NIM, each one of these pre-trained models, against all kinds our entire install base that's in the cloud, all the different versions of Pascal and Ampere and Hopper and all kinds of different versions. I even forget some. NIMs, incredible invention. This is one of my favorites. And of course, as you know, we now have the ability to create large language models and pre-trained models of all kinds. And we, we have all of these various versions, whether it's language-based or vision-based or imaging-based, or we have versions that are available for healthcare, digital biology.

We have versions that are digital humans, that I'll talk to you about. And the way you use this, just come to ai.nvidia.com. And today, we just posted up in Hugging Face, the Llama 3 NIM, fully optimized. It's available there for you to try, and you can even take it with you. It's available to you for free, and so you could run it in the cloud, run it in any cloud. You could download this container, put it into your own data center, and you could host it, make it available for your customers. We have, as I mentioned, all kinds of different domains, physics, some of it is for semantic retrieval called RAGs, vision languages, all kinds of different languages. And the way that you use it is connecting these microservices into large applications.

One of the most important applications in the coming future, of course, is customer service agents. Customer service agents are necessary in just about every single industry. It represents trillions of dollars of customer service around the world. Nurses are customer service agents in some ways. Some of them are non-prescription or non-diagnostic based nurses are essentially customer service. Customer service for retail, for quick service foods, financial services, insurance, just tens and tens of millions of customer service can now be augmented by language models and augmented by AI. And so these boxes that you see are basically NIMs. Some of the NIMs are reasoning agents. Given a task, figure out what the mission is, break it down into a plan. Some of the NIMs retrieve information.

Some of the NIMs might go and do search. Some of the NIMs might use a tool like cuOpt, that I was talking about earlier. They could use a tool that could be running on SAP, and so it has to learn a particular language called ABAP. Maybe some NIMs have to do SQL queries. And so all of these NIMs are experts that are now assembled as a team. So what's happening? The application layer has been changed. What used to be applications written with instructions are now applications that are assembling teams, assembling teams of AIs. Very few people know how to write programs. Almost everybody knows how to break down a problem and assemble teams.

Every company, I believe, in the future, will have a large collection of NIMs, and you would bring down the experts that you want. You connect them into a team, and you, you don't even have to figure out exactly how to connect them. You just give the mission to an agent, to a NIM, to figure out who to break the tasks down and who to give it to. And they, that central, the leader of the, of the application, if you will, the leader of the team, would break down the task and give it to the various team members. The team members would do their, perform their task, bring it back to the team leader, the team leader would reason about that and present an information back to you, just like humans. This is in our near future.

This is the way applications are gonna look. Now, of course, we could interact with these large these AI services with text prompts and speech prompts. However, there are many applications where we would like to interact with whether, what is otherwise a human-like form. We call them digital humans. NVIDIA has been working on digital human technology for some time. Let me show it to you. And well, before I do that, hang on a second. Before I do that, okay, digital humans has the potential of being a great interactive agent with you. They make much more engaging, they could be much more empathetic, and of course, we have to cross this incredible chasm, this uncanny chasm of realism, so that the digital humans would appear much more natural. This is, of course, our vision.

This is a vision of where we'd love to go, but let me show you where we are.

Speaker 9

Great to be in Taiwan. Before I head out to the night market, let's dive into some exciting frontiers of digital humans.

Speaker 10

Imagine a future where computers interact with us just like humans can.

Speaker 11

Hi, my name is Sophie, and I am a digital human brand ambassador for UneeQ.

Speaker 10

This is the incredible reality of digital humans. Digital humans will revolutionize industries, from customer service to advertising and gaming. The possibilities for digital humans are endless.

Speaker 12

Using the scans you took of your current kitchen with your phone-

Speaker 10

They will be AI interior designers, helping generate beautiful, photorealistic suggestions and sourcing the materials and furniture.

Speaker 13

We have generated several design options for you to choose from.

Speaker 10

They'll also be AI customer service agents, making the interaction more engaging and personalized. Or digital healthcare workers who will check on patients, providing timely, personalized care.

Speaker 14

I did forget to mention to the doctor that I am allergic to penicillin. Is it still okay to take the medication? The antibiotics you've been prescribed, ciprofloxacin and metronidazole, don't contain penicillin, so it's perfectly safe for you to take them.

Speaker 10

They'll even be AI brand ambassadors, setting the next marketing and advertising trends.

Speaker 15

Hi, I'm Imma, Japan's first virtual model.

Speaker 10

New breakthroughs in generative AI and computer graphics let digital humans see, understand, and interact with us in human-like ways.

Speaker 3

Hmm, from what I can see, it looks like you're in some kind of recording or production setup.

Speaker 10

The foundation of digital humans are AI models built on multilingual speech recognition and synthesis, and LLMs that understand and generate conversation.

[Foreign Language]

The AIs connect to another generative AI to dynamically animate a lifelike 3D mesh of a face. Finally, AI models that reproduce lifelike appearances, enabling real-time path-traced subsurface scattering to simulate the way light penetrates the skin, scatters, and exits at various points, giving skin its soft and translucent appearance. NVIDIA ACE is a suite of digital human technologies packaged as easy-to-deploy, fully optimized microservices or NIMs. Developers can integrate ACE NIMs into their existing frameworks, engines, and digital human experiences. Nemotron SLM and LLM NIMs to understand our intent and orchestrate other models. Riva Speech NIMs for interactive speech and translation. Audio2Face and gesture NIMs for facial and body animation, and Omniverse RTX with DLSS for neural rendering of skin and hair.

ACE NIMs run on NVIDIA GDN, a global network of NVIDIA-accelerated infrastructure that delivers low-latency digital human processing to over 100 regions.

Jensen Huang

CEO, NVIDIA

Pretty incredible. Well, those ACE runs in the cloud, but it also runs on PCs. We had the good wisdom of including Tensor Core GPUs in all of RTX, so we've been shipping AI GPUs for some time, preparing ourselves for this day. The reason for that is very simple. We always knew that in order to create a new computing platform, you need an installed base first. Eventually, the application will come. If you don't create the installed base, how could the application come? And so if you build it, they might not come, but if you build it—if you don't build it, they cannot come. And so we installed every single RTX GPU with Tensor Core G...

Tensor Core processing, and now we have 100 million GeForce RTX AI PCs in the world, and we're shipping 200, and this, this COMPUTEX, we're featuring four new amazing laptops.... All of them are able to run AI. Your future laptop, your future PC will become an AI. It'll be constantly helping you, assisting you in the background. The PC will also run applications that are enhanced by AI. Of course, all your photo editing, and your writing, and your tools, all the things that you use will all be enhanced by AI. And your PC will also host applications with digital humans that are AIs. And so there are different ways that AIs will manifest themselves and, and become used in PCs, but PCs will become very important AI platform. And so where do we go from here?

I spoke earlier about the scaling of our data centers, and every single time we scaled, we found a new phase change. When we scaled from DGX into large AI supercomputers, we enabled transformers to be able to train on enormously large data, data sets. Well, what happened was, in the beginning, the data was human-supervised. It required human labeling to train AIs. Unfortunately, there are only so much you can human label. Transformers made it possible for unsupervised learning to happen. Now, transformers just look at an enormous amount of data or look at an enormous amount of video or look at enormous amount of images, and it can learn from studying an enormous amount of data, find the patterns and relationships itself. Well, the next generation of AI needs to be physically based. Most of the AIs today don't understand the laws of physics.

It's not grounded in the physical world. In order for us to generate images and videos and 3D graphics and many physics phenomena, we need AIs that are physically based and understand the laws of physics. Well, the way that you could do that is, of course, learning from video is one source. Another way is synthetic data, simulation data, and another way is using computers to learn with each other. This is really no different than using AlphaGo, having AlphaGo play itself, self-play, and between the two capabilities, same capabilities, playing each other for a very long period of time, they emerge even smarter. And so you're gonna start to see this type of AI emerging. Well, if the AI data is synthetically generated and using reinforcement learning, it stands to reason that the rate of data generation will continue to advance.

And every single time data generation grows, the amount of computation that we have to offer needs to grow with it. We are about to enter a phase where AIs can learn the laws of physics and understand and be grounded in physical world data. And so we expect that models will continue to grow, and we need larger GPUs. Well, Blackwell was designed for this generation. This is Blackwell, and it has several very important technologies. One of, of course, is just the size of the chip. We took two of the largest a chip that is as large as you can make it at TSMC, and we connected two of them together with a 10 terabytes per second link between. The world's most advanced SerDes connecting these two together. We then put two of them on a computer node, connected with a Grace CPU.

The Grace CPU could be used for several things. In the training situation, it could be used for fast checkpoint and restart. In the case of inference and generation, it could be used for storing context memory, so that the AI has memory and understands the context of the conversation you would like to have. It's our second-generation Transformer Engine. Transformer Engine allows us to adapt dynamically to a lower precision based on the precision and the range necessary for that layer of computation. This is our second-generation GPU that has secure AI, so that you could ask your service providers to protect your AI from being either stolen, from theft or tampering. This is our fifth-generation NVLink. NVLink allows us to connect multiple GPUs together, and I'll show you more of that in a second.

This is also our first generation with a reliability and availability engine. This system, this RAS system, allows us to test every single transistor, flip-flop, memory on chip, memory off chip, so that we can, in the field, determine whether a particular chip is failing. The MTBF, the mean time between failure of a supercomputer with 10,000 GPUs, is measured in hours. The mean time between failure of a supercomputer with 100,000 GPUs is measured in minutes. And so the ability for a supercomputer to run for a long period of time and train a model that could last for several months is practically impossible if we don't invent technologies to enhance its reliability. Reliability would, of course, enhance its uptime, which directly affects the cost. And then lastly, decompression engine. Data processing is one of the most important things we have to do.

We added a data compression engine, decompression engine, so that we can pull data out of storage 20 x faster than what's possible today. Well, all of this represents Blackwell, and I think we have one here that's in production. During GTC, I showed you Blackwell in a prototype state. The other side? This is why we practice. Ladies and gentlemen, this is Blackwell. Blackwell is in production. Incredible amounts of technology. This is our production board. This is the most complex, highest performance computer the world's ever made. This is the Grace CPU, and these are. You could see each one of these Blackwell dies, two of them connected together. You see that? It is the largest die, the largest chip the world makes, and then we connect two of them together with a 10 TB/s link. Okay?

That makes the Blackwell computer, and the performance is incredible. Take a look at this. So, you see our computational FLOPS, the AI FLOPS, for each generation has increased by 1,000 times in 8 years. Moore's Law in 8 years is something along the lines of, oh, I don't know, maybe 40, 60? And in the last 8 years, Moore's Law has gone a lot, lot less. And so just to compare, even Moore's Law at its best of times compared to what Blackwell could do. So the amount of computations is incredible, and whenever we bring the computation high, the thing that happens is the cost goes down, and I'll show you. What we've done is we've increased, through its computational capability, the energy used to train a GPT-4, 2 trillion parameter, 8 trillion tokens.

The amount of energy that is used has gone down by 350 times. Well, Pascal would have taken 1000 GWh. 1000 GWh means that it would take a gigawatt data center, the world doesn't have a gigawatt data center, but if you had a gigawatt data center, it would take a month. If you had a 100-watt, 100-megawatt data center, it would take about a year. And so nobody would, of course, create such a thing, and that's the reason why these large language models, ChatGPT, was impossible only 8 years ago. By us driving down the increasing the performance, the energy efficient, while keeping and improving energy efficient, efficiency along the way, we've now taken, with Blackwell, what used to be a 1000 GWh to 3, an incredible advance.

3 GWh, if it's a 10,000 GPUs, for example, it would only take a few days, 10 days or so. So the amount of advance in just 8 years is incredible. Well, this is for inference. This is for token generation. Our token generation performance has made it possible for us to drive the energy down by 3, 4, 45,000 times. 17,000 J per token, that was Pascal. 17,000 J is kind of like 2 light bulbs running for 2 days. It would take 2 light bulbs running for 2 days amounts of energy, 200 W running for 2 days, to generate 1 token of GPT-4.

It takes about 3 tokens to generate 1 word, and so the amount of energy used necessary for Pascal to generate GPT-4 and have a ChatGPT experience with you was practically impossible. But now, we only use 0.4 joules per token, and we can generate tokens at incredible rates and very little energy. Okay, so Blackwell is just an enormous leap. Well, even so, it's not big enough, and so we have to build even larger machines. So the way that we build it is called DGX. So this is, this is our Blackwell chips, and it goes into DGX systems. That's why we should practice. So this is a DGX Blackwell. This has... This is air-cooled, has 8 of these GPUs inside. Look at the size of the heat sinks on these GPUs. About 15 kW, 15,000 W, and completely air-cooled.

This version supports x86, and it's a-- it goes into the infrastructure that we've been shipping Hoppers into. However, if you would like to have liquid cooling, we have a new system, and this new system is based on this board, and we call it MGX for modular, and this modular system. You won't be able to see this. Can they see this? Can you see this? You can? Are you-- Okay. I see. And so this is the MGX system, and here's the 2, Blackwell boards. So this one node has 4 Blackwell chips. These 4 Blackwell chips, this is liquid-cooled. Nine of them, nine of them, 72 of these, 72 of these GPUs, 72 of these GPUs, are then connected together with a new NVLink. This is NVLink Switch, fifth generation. And the NVLink Switch is a technology miracle.

This is the most advanced switch the world's ever made. The data rate is insane. These switches connect every single one of these Blackwells to each other, so that we have one giant 72 GPU Blackwell. Well, the benefit, the benefit of this is that in one domain, one GPU domain, this now looks like one GPU. This one GPU has 72 versus the last generation of 8, so we increased it by 9 times. The amount of bandwidth we've increased by 18 times. The AI FLOPS we've increased by 45 times, and yet the amount of power is only 10 times. This is 100 kW, and that is 10 kW, and that's for one. Now, of course, well, you could always connect more of these together, and I'll show you how to do that in a second. But what's the miracle is this chip, this NVLink chip.

People are starting to awaken to the importance of this NVLink chip as it connects all these different GPUs together. Because the large language models are so large, it doesn't fit on just one GPU, it doesn't fit on one, just one node. It's gonna take the entire rack of GPUs, like this new DGX that I was just standing next to, to hold a large language model that are tens of trillions of parameters large. NVLink Switch, in itself, is a technology miracle. It's 50 billion transistors, 74 ports at 400 Gb each, four lanes, cross-sectional bandwidth of 72, 7.2 terabyte per second. But one of the important things is that it has mathematics inside the switch so that we can do reductions, which is really important in deep learning, right on the chip.

And so this is what, this is what, a DGX looks like now. And a lot of people ask us y ou know, they say, and there's this, there's this confusion about what NVIDIA does, and, and, how is it possible that, that NVIDIA became so big building GPUs? And so there's an impression that this is what a GPU looks like. Now, this is a GPU. This is one of the most advanced GPUs in the world, but this is a gamer GPU. But you and I know that this is what a GPU looks like. This is one GPU. Ladies and gentlemen, DGX GPU. You know? The back of this GPU is the NVLink spine. The NVLink spine is 5,000 wires, two miles. And it's right here. This is an NVLink spine, and it connects 72 GPUs to each other.

This is an electrical, mechanical miracle. The transceivers makes it possible for us to drive the entire length in copper, and as a result, this switch, the NVLink Switch, NVLink Switch, driving the NVLink spine in copper, makes it possible for us to save 20 kW in one rack. 20 kW can now be used for processing. Just an incredible achievement. So this is the, the NVLink's spine. Wow! I went down today. And even this is not big enough. Even this is not big enough for AI factories, so we have to connect it all together with very high-speed networking. Well, we have two types of networking. We have InfiniBand, which has been used in supercomputing and AI factories all over the world, and it is growing incredibly fast for us.

However, not every data center can handle InfiniBand because they've already invested their ecosystem in Ethernet for too long. It does take some specialty and some expertise to manage InfiniBand switches and InfiniBand networks. So what we've done is we've brought the capabilities of InfiniBand to the Ethernet architecture, which is incredibly hard, and the reason for that is this: Ethernet was designed for high average throughput because every single node, every single computer is connected to a different person on the internet, and most of the communications is the data center with somebody on the other side of the internet. However, deep learning and AI factories, the GPUs are not communicating with people on the internet, mostly. It's communicating with each other t hey're communicating with each other because they're all collecting partial products, and they have to reduce it and then redistribute it.

Chunks of partial products, reduction, redistribution. That traffic is incredibly bursty, and it is not the average throughput that matters, it's the last arrival that matters. Because if you're reducing, collecting partial products from everybody, if I'm trying to take all of your, so it's not the average throughput, it's whoever gives me the answer last. Okay? Ethernet has no provision for that. And so there are several things that we had to create. We created an end-to-end architecture so that the NIC and the switch can communicate, and we applied four different technologies to make this possible. Number one, NVIDIA has the world's most advanced RDMA, and so now we have the ability to have a network-level RDMA for Ethernet that is incredibly great. Number two, we have congestion control.

The switch does telemetry at all times, incredibly fast, and whenever the GPUs or the NICs are sending too much information, we can tell them to back off so that it doesn't create hotspots. Number three, adaptive routing. Ethernet needs to transmit and receive in order. We see congestions, or we see ports that are not currently being used. Irrespective of the ordering, we will send it to the available ports, and BlueField, on the other end, reorders it so that it comes back in order. That adaptive routing, incredibly powerful. And then lastly, noise isolation. There's more than one model being trained or something happening in the data center at all times, and their noise and their traffic could get into each other and causes jitter.

And so when the noise of one training model, one model training, causes the last arrival to end up too late, it really slows down the training. Well, overall, remember, you have built a $5 billion or a $3 billion data center, and you're using this for training. If the utilization, network utilization, was 40% lower, and as a result, the training time was 20% longer, the $5 billion data center is effectively like a $6 billion data center. So the cost impact is quite high. Ethernet with Spectrum-X basically allows us to improve the performance so much that the network is basically free, and so this is really quite an achievement. We have a whole pipeline of Ethernet products behind us. This is Spectrum-X800.

It is 51.2 terabytes per second and 256 radix. The next one coming is 512 radix, is one year from now, 512 radix, and that's called Spectrum-X800 Ultra, and the one after that is X1600. But the important idea is this: X800 is designed for tens of thousands, tens of thousands of GPUs. X800 Ultra is designed for hundreds of thousands of GPUs, and X1600 is designed for millions of GPUs. The days of millions of GPU data centers are coming, and the reason for that is very simple. Of course, we want to train much larger models, but very importantly, in the future, almost every interaction you have with the internet or with a computer will likely have a generative AI running in the cloud somewhere.

And that generative AI is working with you, interacting with you, generating videos or images or text, or maybe a digital human. And so you're interacting with your computer almost all the time, and there's always a generative AI connected to that. Some of it is on-prem, some of it is on your device, and a lot of it could be in the cloud. These generative AIs will also do a lot of reasoning capability. Instead of just one-shot answers, they might iterate on answers so that it'll improve the quality of the answer before they give it to you. And so the amount of generation we're gonna do in the future is going to be extraordinary. Let's take a look at all of this put together. Now, tonight, this is our first nighttime keynote. I wanna thank...

I wanna thank all of you for coming out tonight at 7:00 P.M., and, and so what I'm about to show you has a new vibe, okay? There's a new vibe. This is kind of the nighttime keynote vibe, so enjoy this.

Speaker 4

Blackwell. Ha! Let's go. Go, go, go, go. Okay. Blackwell. Ha! [Foreign Language] Wow! Come on. Yeah, yeah, yeah, yeah. Get it y'all, get it y'all. Let's go. The more you back, the more you save. With top AI, tailor-made, faster than the speed of light. Efficient, best to date. [Foreign Language]

Jensen Huang

CEO, NVIDIA

Now, you can't do that on a morning keynote. I think that style of keynote has never been done in COMPUTEX ever. Might be the last. Only NVIDIA can pull off that. Only I could do that. Blackwell, of course, is the first generation of NVIDIA platforms that was launched at the beginning, at the-- right as the world knows, the generative AI era is here, just as the world realized the importance of AI factories, just as the beginning of this new industrial revolution. We have so much support. Nearly every OEM, every computer maker, every CSP, every GPU cloud, sovereign clouds, even telecommunication companies, enterprises all over the world. The amount of success, the amount of adoption, the amount of enthusiasm for Blackwell is just really off the charts, and I wanna thank everybody for that. We're not stopping there.

During this, during the time of this incredible growth, we wanna make sure that we continue to enhance performance, continue to drive down cost, cost of training, cost of inference, and continue to scale out AI capabilities for every company to embrace. The further performance we drive up, the greater the cost decline. Hopper platform, of course, was the most successful data center processor probably in history, and this is just an incredible, incredible success story. However, Blackwell is here, and every single platform, as you'll notice, are several things. You've got the CPU, you have the GPU, you have NVLink, you have the NIC, and you have the switch.

The NVLink Switch connects all of the GPUs together as large of a domain as we can, and whatever we can't do, we connect it with large, very large and very high-speed switches. Every single generation, as you'll see, is not just a GPU, but it's an entire platform. We build the entire platform. We integrate the entire platform into an AI factory supercomputer. However, then we disaggregate it and offer it to the world. And the reason for that is because all of you could create interesting and innovative configurations and all kinds of different styles and fit different data centers and different customers and different places, some of it for Edge, some of it for Telco.

And all of the different innovation are possible if we made the systems open and make it possible for you to innovate. And so we design it integrated, but we offer it to you disintegrated, so that you could create modular systems. The Blackwell platform is here. Our company is on a one-year rhythm, where our basic philosophy is very simple: one, build the entire data center scale, disaggregate it, and sell it to you in parts on a one-year rhythm, and we push everything to technology limits. Whatever TSMC process technology, we'll push it to the absolute limits. Whatever packaging technology, push it to the absolute limits. Whatever memory technology, push it to the absolute limits. SerDes technology, optics technology, everything is pushed to the limit.

Well, and then after that, do everything in such a way so that all of our software runs on this entire install base. Software inertia is the single most important thing in computers. It'll. When a computer is backwards compatible, and it's architecturally compatible with all the software that has already been created, your ability to go to market is so much faster. And so the velocity is incredible when we can take advantage of the entire install base of software that has already been created. Well, Blackwell is here. Next year is Blackwell Ultra. Just as we had H100 and H200, you'll probably see some pretty exciting new generation from us for Blackwell Ultra. And again, pushed to the limits and the next generation Spectrum-X switch, as I mentioned. Well, this is the very first time that this NVLink Switch has been made.

I'm not sure yet whether I'm gonna regret this or not. We have code names in our company, and, we try to keep them very secret. Oftentimes, most of the employees don't even know, but our next generation platform is called Rubin.

The Rubin platform, the Rubin platform. I'm not gonna spend much time on it. I know what's gonna happen. You're gonna take pictures of it, and you're going to go look at the fine print, and feel free to do that. So we have the Rubin platform, and one year later, we have the Rubin Ultra platform. All of these chips that I'm showing you here are all in full development, 100% of them, and the rhythm is one year at the limits of technology, all 100% architecturally compatible. So this is, this is basically what NVIDIA is building and all of the richness of software on top of it.

So in a lot of ways, the last 12 years, from that moment of ImageNet and us realizing that the future of computing was gonna radically change to today, is really exactly as I was holding up earlier, GeForce pre-2012 and NVIDIA today. The company has really transformed tremendously, and I wanna thank all of our partners here for supporting us every step along the way. This is the NVIDIA Blackwell platform. Let me talk about what's next. The next wave of AI is physical AI, AI that understands the laws of physics, AI that can work among us. And so they have to understand the world model so that they understand how to interpret the world, how to perceive the world. They have to, of course, have excellent cognitive capabilities so they can understand us, understand what we asked, and perform the tasks.

In the future, robotics is a much more pervasive idea. Of course, when I say robotics, there's a humanoid robotics that's usually the representation of that, but that's not at all true. Everything is gonna be robotic. All of the factories will be robotic. The factories will orchestrate robots, and those robots will be building products that are robotic. Robots interacting with robots, building products that are robotic. Well, in order for us to do that, we need to make some breakthroughs, and let me show you the video.

Speaker 5

The era of robotics has arrived. One day, everything that moves will be autonomous. Researchers and companies around the world are developing robots powered by physical AI. Physical AIs are models that can understand instructions and autonomously perform complex tasks in the real world.

Multimodal LLMs are breakthroughs that enable robots to learn, perceive, and understand the world around them and plan how they'll act. From human demonstrations, robots can now learn the skills required to interact with the world using gross and fine motor skills. One of the integral technologies for advancing robotics is reinforcement learning. Just as LLMs need RLHF, or reinforcement learning from human feedback, to learn particular skills, generative physical AI can learn skills using reinforcement learning from physics feedback in a simulated world. These simulation environments are where robots learn to make decisions by performing actions in a virtual world that obeys the laws of physics. In these robot gyms, a robot can learn to perform complex and dynamic tasks safely and quickly, refining their skills through millions of acts of trial and error. We built NVIDIA Omniverse as the operating system where physical AIs can be created.

Omniverse is a development platform for virtual world simulation, combining real-time, physically based rendering, physics simulation, and generative AI technologies. In Omniverse, robots can learn how to be robots. They learn how to autonomously manipulate objects with precision, such as grasping and handling objects, or navigate environments autonomously, finding optimal paths while avoiding obstacles and hazards. Learning in Omniverse minimizes the sim-to-real gap and maximizes the transfer of learned behavior. Building robots with generative physical AI requires three computers: NVIDIA AI supercomputers to train the models, NVIDIA Jetson Orin, and next-generation Jetson Thor robotic supercomputer to run the models, and NVIDIA Omniverse, where robots can learn and refine their skills in simulated worlds. We build the platforms, acceleration libraries, and AI models needed by developers and companies and allow them to use any or all of the stacks that suit them best. The next wave of AI is here.

Robotics, powered by physical AI, will revolutionize industries.

Jensen Huang

CEO, NVIDIA

This isn't the future. This is happening now. There are several ways that we're gonna serve the market. The first, we're gonna create platforms for each type of robotic systems, one for robotic factories and warehouses, one for-

R obots that manipulate things, one for robots that move, and one for robots that are humanoid. And so each one of these robotic platforms is like almost everything else we do, a computer, acceleration libraries, and pre-trained models. Computers, acceleration libraries, pre-trained models. And we test everything, we train everything, and integrate everything inside Omniverse, where Omniverse is, as the video was saying, where robots learn how to be robots. Now, of course, the ecosystem of robotic warehouses is really, really complex. It takes a lot of companies, a lot of tools, a lot of technology to build a modern warehouse. And warehouses are increasingly robotic, and one of these days will be fully robotic.

So in each one of these ecosystems, we have SDKs and APIs that are connected into the software industry, SDKs and APIs connected into edge AI industry, and companies, and then also of course, systems that are designed for PLCs and robotic systems for the ODMs. It's then integrated by integrators, created for ultimately, building warehouses, for customers. Here, we have an example of Kenmec building a robotic warehouse for Giant Group. Okay? Then here, now let's talk about factories. Factories has a completely different ecosystem, and Foxconn is building some of the world's most advanced factories. Their ecosystem, again, edge computers and robotics, software for designing the factories, the workflows, programming the robots, and of course, PLC computers that orchestrate, the digital factories and the AI factories.

We have SDKs that are connected into each one of these system ecosystems as well. This is happening all over Taiwan. Foxconn is building digital twins of their factories. Delta is building digital twins of their factories. By the way, half is real, half is digital, half is Omniverse. Pegatron is building digital twins of their robotic factories. Wistron is building digital twins of their robotic factories. This is really cool. This is a video of Foxconn's new factory. Let's take a look.

Speaker 5

Demand for NVIDIA accelerated computing is skyrocketing as the world modernizes traditional data centers into generative AI factories. Foxconn, the world's largest electronics manufacturer, is gearing up to meet this demand by building robotic factories with NVIDIA Omniverse and AI. Factory planners use Omniverse to integrate facility and equipment data from leading industry applications like Siemens Teamcenter X and Autodesk Revit. In the digital twin, they optimize floor layout and line configurations and locate optimal camera placements to monitor future operations with NVIDIA Metropolis-powered vision AI. Virtual integration saves planners on the enormous cost of physical change orders. During construction, the Foxconn teams use the digital twin as the source of truth to communicate and validate accurate equipment layout. The Omniverse digital twin is also the robot gym, where Foxconn developers train and test NVIDIA Isaac AI applications for robotic perception and manipulation, and Metropolis AI applications for sensor fusion.

In Omniverse, Foxconn simulates two robot AIs before deploying runtimes to Jetson computers. On the assembly line, they simulate Isaac Manipulator libraries and AI models for automated optical inspection for object identification, defect detection, and trajectory planning. To transfer HGX systems to the test pods, they simulate Isaac Perceptor-powered FARobot AMRs as they perceive and move about their environment with 3D mapping and reconstruction. With Omniverse, Foxconn builds their robotic factories that orchestrate robots running on NVIDIA Isaac to build NVIDIA AI supercomputers, which in turn train Foxconn's robots.

Jensen Huang

CEO, NVIDIA

So a robotic factory is designed with three computers. Train the AI on NVIDIA AI. You have the robot running on the PLC systems for orchestrating the factories, and then you, of course, simulate everything inside Omniverse. Well, the robotic arm and the robotic AMRs are also the same way, three computer systems. The difference is the two Omniverses will come together, so they'll share one virtual space. When they share one virtual space, that robotic arm will become inside the robotic factory. And again, three computers, and we provide the computer, the acceleration layers, and pre-trained AI models. We've connected NVIDIA Manipulator and NVIDIA Omniverse with Siemens, the world's leading industrial automation software and systems company. This is really a fantastic partnership, and they're working on factories all over the world. SIMATIC Pick AI now integrates-...

Isaac Manipulator and SIMATIC Pick AI runs operates ABB, KUKA, Yaskawa, FANUC, Universal Robots, and Techman. And so Siemens is a fantastic integration. We have all kinds of other integrations. Let's take a look.

Speaker 5

ArcBest is integrating Isaac Perceptor into Vaux autonomy robots for enhanced object recognition and human motion tracking in material handling. BYD Electronics is integrating Isaac Manipulator and Perceptor into their AI robots to enhance manufacturing efficiencies for global customers, idealworks is building Isaac Perceptor into their iw.os software for AI robots in factory logistics. Intrinsic, an Alphabet company, is adopting Isaac Manipulator into their Flowstate platform to advance robot grasping. Gideon is integrating Isaac Perceptor into Trey AI-powered forklifts to advance AI-enabled logistics. RGo Robotics is adopting Isaac Perceptor into Perception Engine for advanced vision-based AMRs. Solomon is using Isaac Manipulator AI models in their AccuPick 3D software for industrial manipulation. Techman Robot is adopting Isaac Sim and Manipulator into TMflow, accelerating automated optical inspection. Teradyne Robotics is integrating Isaac Manipulator into PolyScope X for cobots and Isaac Perceptor into MiR AMRs.

Vention is integrating Isaac Manipulator into MachineLogic for AI manipulation robots.

Jensen Huang

CEO, NVIDIA

Robotics is here. Physical AI is here. This is not science fiction, and it's being used all over Taiwan, and just really, really exciting. That's the factory, the robots inside, and of course, all the products are gonna be robotics. So there are two very high-volume robotics products. One, of course, is the self-driving car or cars that have a great deal of autonomous capability. NVIDIA, again, builds the entire stack. Next year, we're gonna go to production with the Mercedes fleet, and after that, in 2026, the JLR fleet. We offer the full stack to the world. However, you're welcome to take whichever parts, whichever layer of our stack, just as the entire DRIVE stack is open.

The next high-volume robotics product that's going to be manufactured by robotic factories with robots inside will likely be humanoid robots. And this has great progress in recent years in both the cognitive capability, because of foundation models, and also the world understanding capability that we're in the process of developing. I'm really excited about this area because, obviously, the easiest robot to adapt into the world are humanoid robots because we built the world for us. We also have the vast, the most amount of data to train these robots than other types of robots because we have the same, physique. And so the amount of training data we can provide through demonstration capabilities and video capabilities is gonna be really great. And so we're gonna see a lot of progress in this area.

Well, I think we have some robots that we'd like to welcome. Here we go. About my size. And we have some friends to join us. So the future of robotics is here, the next wave of AI. And of course, you know, Taiwan builds computers with keyboards. You build computers for your pocket. You build computers for data centers in the cloud. In the future, you're gonna build computers that walk and computers that roll, you know, around. And so these are all just computers. And as it turns out, the technology is very similar to the technology of building all of the other computers that you already build today. So this is going to be a really extraordinary journey for us. Well, I wanna thank. I wanna thank. I wanna.

I've made one last video, if you don't mind. Something that we really enjoyed making. And if you... Let's run it.

Speaker 6

Taiwan [Foreign Language]

Jensen Huang

CEO, NVIDIA

[Foreign Language]. Thank you! I love you guys. Thank you. [Foreign Language]. Thank you all for coming. Have a great COMPUTEX.