Mobileye Global Inc. (MBLY)
NASDAQ: MBLY · Real-Time Price · USD
9.23
+0.53 (6.09%)
At close: Apr 24, 2026, 4:00 PM EDT
9.28
+0.05 (0.54%)
After-hours: Apr 24, 2026, 7:57 PM EDT
← View all transcripts

CES 2026 Keynote

Jan 6, 2026

Dan Galves
Chief Communications Officer, Mobileye Global Inc.

We're about to begin, everyone. Thanks for being here. We're good? Thank you, and welcome, everyone, to the annual keynote address of Mobileye's CEO, Professor Amnon Shashua. I'm Dan Galves, Chief Communications Officer. We have a really interesting agenda for you today. Of course, it's exciting to share the news about 30 minutes ago that Mobileye has agreed to acquire Mentee Robotics. A portion of Amnon's speech will cover that. We're also really excited to provide updates on the automotive side, as we always do at CES. This includes our collaborative work with VW's MOIA Group to bring autonomous vehicles to the roads of the US and Europe. On that note, we're very gratified to have Christian Senger, CEO of Volkswagen Autonomous Mobility, to join us for a conversation with Amnon. Thanks for being here, Christian.

Before Amnon begins, I'll read a forward-looking statement, and then I'll play a short video. Please note that today's discussion contains forward-looking statements based on the business environment as we currently see it. Such statements involve risks and uncertainties. Please refer to the accompanying presentation and Mobileye's periodic reports and other filings with the U.S. Securities and Exchange Commission, in particular the sections therein entitled Risk Factors, which include additional information on the specific risk factors that could cause actual results to differ materially. Thanks, and we'll start with a short video. I'd like to welcome Amnon to the stage. Thank you very much.

Amnon Shashua
CEO, Mobileye Global Inc.

Hello, everyone, and good afternoon. I'll address the scope of robotics, where Mobileye is entering into the full aspects of physical AI with the acquisition of a humanoid robotics company. But let's start first with Mobileye, our main business, and then towards the end of my talk, I'll cover aspects of humanoid robotics, and I'll show you some clips never shown before by any company in humanoid robotics. So let's begin. Normally, first, how we did back in 2025. So when we look at our RFQs with our top 10 customers, we won 95% of all RFQs during 2025. We had two new design wins with OEMs that we did not have a relationship with, Volvo and Subaru. In terms of our EyeQ6 Lite, EyeQ6 Lite is the chip for the basic ADAS, the high-volume part of our business.

During 2025, our pipeline of EyeQ6 Lite was 3.5x compared to 2024, so very, very strong momentum. In terms of our pipeline in the next eight years, $24.5 billion. Just to remind everyone that we are a bit short of $2 billion revenue in 2025. So it means if you take this amount and you divide it by eight, it's already 50% more than our revenue in 2025. So this is a very strong pipeline. This is a pipeline based on signed contracts with OEMs. $18 billion of it was awarded since we started our IPO, a 40% increase of pipeline since 2023. And then we are competing on RFQs with a potential of 100 million more chips, so that will even increase our pipeline.

If we look at our deployment since the inception of the company, more than 230 million chips, 230 million cars on the road, I think it's one-eighth of the total cars on the planet. So this is a very, very big number. In terms of our REM, our technology for building high-definition maps, we have 8 million cars sending, we call this harvesting, sending us data every day. 32 billion miles were recorded, harvested during 2025, and three new OEMs have joined this harvesting fleet. So this allows us to build high-definition maps, which we call REM, REM maps, high-definition maps with coverage that is expanding Europe, US, Asia, and so forth. We just announced yesterday a US OEM with a Surround ADAS.

Surround ADAS is our system with EyeQ6 High, our latest high-compute chip, five cameras and five radars, five or six cameras, so front-facing camera, four parking cameras, plus one to five radars. We believe that this is the evolution of ADAS. ADAS eventually will move from front-facing to what we call the Surround ADAS, and nine million vehicles shows that this is really high volume, and SOP is middle of 2028. So when we look at Surround ADAS, we have already two design wins: Volkswagen, which we announced several months ago, and the major US OEM. Together, it's 19 million units, and it shows that this is on its way to become the next step of driving assist, the next evolution of ADAS.

With ASPs, when we are at tier two, somewhere between two to four times our normal ASP, and when we are at tier one, somewhere between 12 to 14x, and just the status of our advanced products, Surround ADAS, I just mentioned. With Volkswagen, it's already in B-Sample stage. This is together with Valeo. Valeo is the tier one, we are the tier two, and the full software would be installed end of this year for an SOP of 2028. Super Vision with Porsche and Audi, the Volkswagen Group, premium brands of the Volkswagen Group. We're already in C-Sample hardware. We have dozens of vehicles in test phase, and our Gen 2 stack, which I'll talk about this later, would be ready in April this year, and SOP is 2027. Chauffeur, our level three with Audi. This is a B-Sample stage.

We have prototype vehicles in testing, and we are working on specific functions for eyes-off. Again, this is an SOP a year, year and a half later. In terms of our, if we look at our robotaxi activity, we have multiple sites, both with our partner MOIA and Volkswagen Autonomous Mobility, with the ID. Buzz, multiple locations in Europe, countries like Oslo, which have lots and lots of snow. So we're not only testing in sunny conditions, we're testing in all possible conditions. This is with ID. Buzz, with the Holon vehicle. So you see that we are covering multiple sites, multiple cities, multiple geographies. So let's start with our robotaxi. This is a project, the lead project is with Volkswagen and MOIA.

Just to recap where we're standing, in terms of the technology, the sensor configurations, 13 cameras, there are three long-range LiDARs by Innoviz, six flash LiDARs also by Innoviz, and five imaging radars. These are manufactured by Mobileye. Very special radars. We talked about that last year. There are about 100 vehicles today test driving all over the sites that I mentioned before. The coming milestone, the Level 4 ready vehicle would be ready in February this year, so next month. We're going driverless Q3, Q4 in the U.S., and then start expanding in 2027 to multiple cities with also partners like Ruter, Holo, Uber. It's all under the roof of Volkswagen and MOIA. Here's just to show kind of the breadth of testing with the ID. Buzz, multiple cities, multiple geographies, also snow. One of them is called U.S. Launch City.

We're still keeping the identity of the city in wraps, but the idea is to start launching there in Q3, so I'm happy to invite Christian Senger, he's the Volkswagen Autonomous Mobility CEO and also the chairman of MOIA, and we'll have here a clip that will run in the background while we speak. Please, Christian, so Christian, thank you for joining us, joining me on stage. We have been working for the last two years on the ID. Buzz. I think it's a great partnership. Give me your perspective, why you think we fit together, what's going well in this kind of partnership.

Christian Senger
CEO, Volkswagen Autonomous Mobility

Thank you, Amnon. I think the short answer is that we have absolute clarity of roles and a shared vision. From day one on, we were aligned on a simple idea. Autonomous mobility will only scale cost-effectively if each partner focuses on what each partner is best on. And so Volkswagen brings industrial-scale vehicles and homologation. Mobileye brings the state-of-the-art level four system, which we integrate very well by MOIA, a new entity of the Volkswagen Group, into a turnkey solution that makes autonomous mobility as a service product. And we are all doing what we do best, and this is why our partnership works really well.

Amnon Shashua
CEO, Mobileye Global Inc.

Very good. So let's highlight what are the next milestones. We have a very exciting year ahead. Our test fleet has grown a lot. We have now more than 100 ID. Buzz vehicles in multiple sites around the world. So I think we're in a very good position. Let's talk about the next milestones.

Christian Senger
CEO, Volkswagen Autonomous Mobility

Exactly. Today we have more than 100 ID. Buzz AD vehicles running across Europe and the U.S., more to come. We are in Munich, Hamburg, we are in Oslo, but also in Austin and Los Angeles. More locations are coming soon. The scale of real-world tests across very different traffic, weather conditions, and also legal systems is rare in the industry, and the ID. Buzz AD has a perfect platform, leveraging technology from across all the Volkswagen Group.

Amnon Shashua
CEO, Mobileye Global Inc.

Yeah, I think also there's a great synergy in what we do together. And I think it's a two-way synergy. On one hand, we benefit from the economies of scale, working with Supervision and Chauffeur, with Porsche and Audi and other premium brands of Volkswagen. So there's common hardware. For example, the ID. Buzz has two ECU boards, which are duplication of the SV62, which we have with the Porsche Supervision. So there's economy of scales on this end. And at the same time, we benefit from all the data that we collect through the ID. Buzz. It's priceless data where we have 360 collection, 360 not only video, but LiDARs and radars, and we have technologies that create ground truth, automatic ground truth from this data. All of that allows us to bolster and improve all our sensing technologies, which is relevant also to our Supervision and Chauffeur.

This two-way synergy is very, very strong.

Christian Senger
CEO, Volkswagen Autonomous Mobility

That's really a good point. I remember in prior years, everyone talked about creating a scalable self-driving system from Level 2 to Level 4. Now the Volkswagen Group, MOIA, and Mobileye is actually doing it. The development process we have seen in the year 2025 is even strengthening our confidence in this path we are on. I really spend a lot of time in the vehicle seeing the performance and seeing the focus we have on continuous improvement, what makes the product over time. We have already revealed the production version of the ID. Buzz in full self-driving version last year as the first fully self-purpose-built vehicle. We see it on the charts behind us.

But forward-looking, by the end, by the third quarter of this year, we expect to go live with the entire MOIA ecosystem, including the vehicle, the Mobileye self-driving system, the software platform for fleet control, remote guidance, and all the secondary driver tasks. And even more, this will be followed by the launch of driverless services by end of 2026 in the US, and followed by EU homologation in 2027. We will come to rapid market expansion thereafter.

Amnon Shashua
CEO, Mobileye Global Inc.

Yeah, very, very good. We're ready to support all the scale. I wonder, what's your view of where the market is developing, where the market is heading?

Christian Senger
CEO, Volkswagen Autonomous Mobility

The key lesson is that autonomous driving has shifted from technology challenge to scaling and business model challenge, so scaling remains difficult. The breakthrough comes from specialization and a strong ecosystem model, and MOIA sits precisely on this intersection. We have two tasks to do. First, we integrate vehicle and self-driving system, combine this into our own software and service tools, and make this to a real true turnkey solution for our customers, so in short, we integrate what the supply side delivers with what the demand side needs into a turnkey solution ready for autonomous mobility. This is our ecosystem logic and our starting lineup. First, the ID. Buzz AD, a purpose-built safe vehicle with the self-driving system, Drive64 installed already in the production line, enabled by Volkswagen industrial scale and backed by logistics and after-sales capabilities.

Second, the digital driver by Mobileye, a Level 4 self-driving system built on more than 20 years of ADAS experience, a global data foundation, what you explained very well, and an industrial experience. Third, the ecosystem platform by MOIA, and this is new. The software backbone that unites the ecosystem, including passenger management, fleet control, remote guidance, safety oversight, and real-time monitoring, and it enables to be connected to several booking platforms. And fourth, what truly differentiates us in our ability to combine the best of both worlds, Volkswagen's industrial scale and high-volume manufacturing, and MOIA's strong technology expertise with partnerships and a deep experience in urban mobility. The platform supports multiple use cases, from robotaxi to ride pooling to shuttles and line services for demand generators and operators around the world. And because it's built as an ecosystem, it scales faster, significantly reduces cost, and reaches break-even much earlier.

This leads us to some very clear targets: six cities by end of 2027, and more than 100,000 active self-driving vehicles on the road by end of 2033.

Amnon Shashua
CEO, Mobileye Global Inc.

Amazing. This is truly amazing. So thank you again for joining us and for the strong partnership. We've been working in a great partnership for the last two years, and I'm looking forward to a very bright future. And really, as you said, the name of the game is scale. Technology, it's quite clear that technology can work. Now it's a matter of how to scale. And with MOIA, Volkswagen, I believe we can do that. Thank you very much, Christian.

Christian Senger
CEO, Volkswagen Autonomous Mobility

Thanks a lot, Amnon.

Amnon Shashua
CEO, Mobileye Global Inc.

Okay, let's continue. And I would like now to take the opportunity to go a bit under the hood. I like every time to pick your brains about technology. So let's go under the hood. I'll go under the hood in our Robotaxi technology stack, but there's lots of shared components with all our stack, SuperVision, Chauffeur, and even going down to Surround ADAS and front-facing ADAS. So there are basically three things that one needs to consider when building such a stack. One is how do we harness the best way to harness the modern AI, generative AI, whether it's vision language models, vision language action models, fast and slow. I'll talk about this later. There are all sorts of caricature approaches, which people mention, but they don't do because it's very easy on the ears of investors to hear the simplified version.

I'll go into the nuances of what really is being done. Validation methodology, saying in robotaxi, you collected sufficient data to convince yourself that you can go driverless in a city. Now you want to expand to another city. How much data do you need to collect? This is part of validation. Economy of scale, what would the sensor setup look like four years from now? We have certain ideas of how to reduce cost by reducing the sensor suite. I'll mention that later. And then more importantly, in a robotaxi setting, there are these teleoperations, which everybody knows that exist, but nobody talks about. So if you have one teleoperator per one vehicle, you don't have a business. You haven't done anything. So you need to aspire to have one teleoperator over many, many cars and eventually asymptotically to have no teleoperations. That's how you build a business.

So Christian mentioned 100,000 vehicles. So we don't want 100,000 teleoperators in the back office or even 50,000 or even 10,000. So how do you handle this asymptotically? You can start with one teleoperator per one vehicle at the beginning, but later you need to convince yourself that you have a line of sight on how you kind of reduce this considerably. Then when we look at the ingredients of autonomy, the first two left columns have to do with how do we take vision language models? So just to remind, vision language model is a transformer where the input is both images and text and the output is text. So how do we harness VLMs in a way that makes sense? So kind of the caricature approach, I'm not saying naive approach because nobody does it. So it's not naive. It's a caricature. So what is the caricature?

As you have pixels coming in, you have a network in the middle, say a transformer network, a VLM, and trajectory, the commands coming out. Now, nobody does this because there are so many reasons why this is wrong. First of all, these networks hallucinate. We know that from ChatGPT, Gemini, language models. They hallucinate. So it means in our world, how do you give safety guarantees when you have something that can hallucinate? Second, there are issues of sample complexity. Sample complexity is the amount of data that you need in order to generalize. So the sample complexity of perception is much, much smaller than the sample complexity of planning because planning is a multi-agent, so there's a compounding effect in sample complexity. So putting them together doesn't make sense from a sample complexity.

So when you look at research, talking about recent research, recent academic papers, also blogs of actors in the space, they have heads that create Sensing State. Sensing State is the recording of all objects, of all relevant information around the car because that will reduce Sample Complexity. So all of a sudden now you need to label data. So it's not just input in and commands out, and you have a driver that gives you kind of the error signal. It is you have to now label all the objects around you. So this adds another nuance, another complication. So the caricature approach is just a caricature. In reality, it's that there are nuances which I want to get in.

What people have noticed, and it started from robotics actually, is that when the problem is you want to decipher a scene, which is a very complicated scene, a VLM could be a very interesting tool because it has been trained on all the data of the internet. Imagine trained on all YouTube clips. So it has a very strong sense of scene understanding. But the demand for scene understanding is sparse. You don't need to do that at 10 Hertz. You don't need to do it at 10 frames per second. So what has emerged is a concept called fast and slow. So it has emerged both in academia, academic papers like from Li Auto , a system called Drive VLM. It has emerged from Waymo's blog. They also mentioned fast and slow. And in robotics like Figure AI, Helix system.

So what that means, you have a fast route, which is 10 frames per second that is responsible for all the safety layers. And then you have a slow system, which is one frame per second, two frames per second, which does the deep scene understanding when it is needed. So now how do you put them together, this fast and slow? The third column is policy. Policy is the planning, is the decision making. The sample complexity of policy is very, very high, again, because it's the multi-agent compounding effect. The host vehicle performs an action in the world, and this action affects other road users. Therefore, there's this compounding effect. So when you have something of a very high complexity, you need a lot, a lot of data. Now, the data coming from real world is limited.

Even if you have millions of cars sending data, it's still limited. You can run on a simulator, say photorealistic simulators, and do training over a simulator, but then you are compute bound because then compute becomes a bottleneck. Running over a photorealistic simulator requires a lot, a lot of compute, so say the target is to train over one billion hours of driving, so this is not realistic for real data and not realistic for photorealistic simulator because of compute constraints, so how to do that, and there are interesting innovations there. The last one is end-to-end. End-to-end is important because what end-to-end does, you back propagate from the commands back to the input of the system, and you optimize what really matters because you can have perception mistakes that don't matter or you have perception mistakes that could kind of accumulate.

When you do this back propagation, you ameliorate this accumulation. End-to-end is important in that aspect. After you build a system, you want to do this last fine-tuning of an end-to-end in order to optimize what really matters. Normally, this requires that all your components are differentiable because this is how you do back propagation, but this is not necessarily the case. I'll mention that. Those are the ingredients. Now let's put these ingredients together into an architecture. The pieces in blue are the pieces that I want to expand. Let's start from the left. The left, you have sensors, cameras, LiDARs, radars, all the modalities. You have also high-definition map. Again, I'm talking about the robotaxi world. There's no robotaxi that drives without the high-definition map. You have maps and you could have other data, telemetry data.

All of that goes into an end-to-end perception network. Even this is not true. I put here end-to-end perception network, but in reality, in machine learning, there's a concept called shortcut learning. What this means, if you have two sources, one with a high sample complexity and the other one with a low sample complexity. For example, LiDARs have a low sample complexity because they record 3D data. Cameras, they don't record 3D data. You have to infer 3D from 2D images. So sample complexities of cameras is much, much higher sample complexity of LiDARs. What happens if you do this low-level fusion, you feed them into one network, the network will tend to do a shortcut and rely only on the LiDARs. So you have to be a bit more sophisticated there.

Let's assume you have a network for the cameras, a network for the LiDARs, and then you have a network that does this fusion. Let's not go into that resolution. Let's call this an end-to-end perception network. This network now outputs a sensing state. Sensing state is the recording of all the relevant information around the car, where the vehicles are, the type of vehicles, the lanes, the traffic lights, pedestrian crossing, whatever. All the relevant information, that's a sensing state. As I mentioned before, you have to output sensing state. Even those that talk about only pure end-to-end, they also output. They have a special head to output sensing state because you need to reduce the sample complexity. This is backed by academic papers. It's backed by blogs of actors in this space and so forth. You output a sensing state.

Now, this sensing state goes into this blue box ACI, which I'll mention in a moment. That is responsible for the driving policy, for the planning. So it receives that input, the sensing state. It receives as input also the slow route, which is the VLM. We call it VLSA. S is for semantic. So it's a vision language semantic action model, which I'll expand in a moment, and it makes decisions. It goes into a safety layer, where the safety layer receives as input the commands and also the sensing state. And it employs RSS and PGF, stuff that I talked about last year and outputs the commands. So now the innovation, there are two innovative blocks here, which are in blue, which I now want to expand. First of all, is the ACI.

ACI is something unique to Mobileye, although there are academic papers that talk about this as well. Let's start with ACI. What we want is we want to train over simulated data. Now, it's not photorealistic simulation. It is the sensing state simulation. That means you have a map, and on the map, you can place agents, which are cars, buses, pedestrians, and so forth, and you start to simulate. You can generate as much data as you want and you can also generate interesting data because when you talk about real-world data, most of the data is boring. You want to inject edge cases at a much higher density. A simulated environment will allow you to do that. Because the input is sensing state, it's not photorealistic. We can first generate much, much bigger volumes of data.

Second, we're not compute bound because it's not a photorealistic simulator. Now, what is the inspiration? Inspiration is the AlphaGo and AlphaGo Zero, a concept called self-play. So just briefly to mention, 2015, DeepMind introduced a reinforcement learning system to play Go. Go is a game much more difficult than chess. And what they did, they imitated humans' game playing. So they had millions of games played by humans, and they created an imitation of the humans, and they achieved impressive results. It was really amazing. A year later, they introduced something which is really mind-boggling. There was no human data at all. What the system has done, it played against itself called self-play. It could then train on much more data because it's not limited by the amount of data you have from humans. And it was much better than AlphaGo. And they called it AlphaGo Zero.

So this concept of self-play is the inspiration of what I want to show here. So we call this ACI, Artificial Community Intelligence. Idea is to use self-play in order to train planning. So what do we have here? We take a map, and now we can leverage our REM maps. Basically, we have the entire world mapped. We have all of the U.S., all of Europe. So maps, we have plenty. You take the maps, and now you place agents on the map. Now, when you are training driving policy, one of the things, one of the challenges is that you are training your driving policy. You want to be a safe driver, but you need to make assumptions on the driving policy of the other agents, which are not necessarily safe drivers.

You cannot assume that the other agents are using the same driving policy as you are using because you are a safe driver. So now there's the question, what's the driving policy of all other agents? In this type of simulation, you are creating all possible driving policies. That means, first of all, you have a kinematic profile. So kinematic profile of a pedestrian, of a truck, of a car is different. Then you have reward weightings. You can have rewards for reckless driving, for fast driving, for slow driving, for someone who's violating traffic rules. So you can create a lot, a lot of behaviors. You can create hundreds and thousands of different behaviors. And then you have augmentations for every agent. It could be you are doing abrupt stops. You can violate traffic laws or not violate traffic laws.

In this way, you create a superset of all possible behaviors. And now you're doing reinforcement learning where the goal is reach your destination and don't collide. No collisions. And what is our unique aspect in this? Because we didn't invent self-play. We didn't even invent this concept of training on a simulator. There are two aspects. First, we're leveraging our REM maps. So we're not just taking some city, which we have data of and training on it. We can train on data on the entire world in terms of maps. Second, the big issue, you trained on a simulator, but you want to move it to the real world. It's called sim-to-real. So you need to understand the noise model of your perception engine. Your perception engine is not perfect. How do you kind of capture the noise model of your perception?

And we developed very, very sophisticated techniques on this sim to real transfer in order to transfer the policy learned on a simulator and bring it to the real world, so let me now just show you what this means, so we have here agents. There are 12 agents you see here in the cars. The circles are destinations, so when an agent reaches the destination, the agent disappears from this simulation, and then you have crosswalks and stop signs, so now, if we look after tens of hours of training, there are 12 agents, none of them reached the targets, and there were six collisions. You look at this, it looks terrible. They didn't even stay within their lanes, so nothing is happening after tens of hours of training. If you look after 140,000 hours of training, all agents reached their destination, but there were two collisions.

Let's have a look at this. You see, in here, that was a collision. If we train after 2.8 million hours of training, all agents reached their targets and zero collisions, so we are not training millions of hours. We're training billions of hours, and we have a cluster where all these billions of hours are trained overnight. So overnight, you can train with an amount of data that no real-world data can match to. And this is very important. We do that also for training the driving policy and also for validating the driving policy, which I'll mention in a moment. The ACI provides first policy training. We can train with much, much more data that you can imagine from real-world data or from photorealistic simulation. And also, we can validate. Imagine robotaxi, where now we want to expand to a new city.

We have our HD map of the new city because we have HD maps all over the world. We take that HD map of the new city, and then we train one billion hours to validate that there's nothing in this new city that is unfamiliar in terms of the map. For example, there could be a very special four-way stop with multiple lanes that you haven't encountered or you haven't trained when you did your one billion hours because it also depends on the type of map. So in this way, you can validate overnight that there's nothing in the map that creates an unfamiliar situation for your driving policy. So this also provides policy validation. So this was one blue box. The second blue box is how the vision-language model, the slow path. So this is the kind of scenes.

When you look at this, you ask yourself, what is happening here and what should I do? So normally, in a robotaxi setting, what you do in this kind of situation, you ask your teleoperator. This is why you have teleoperators, and the teleoperator will tell you, on the left-hand, you have to yield to the policeman, ignore the traffic light, and yield to the policeman. On the right-hand side, it tells you your lane is blocked and you have to turn either right or left depending on your destination. Now, the demand for this is not 10 frames per second. This is why it's a slow thinking, so this is the slow versus fast. The fast is the safety layer, the 10 frames per second decision. It goes through safety layers like RSS and so forth.

The slow is to get very deep understanding of a scene when the scene is very, very complicated. Vision-language models are a very good candidate to help you understand complex scenes because they are trained all over the internet, all YouTube clips you can imagine. They have very strong sense of understanding of the scene. In the context of robotaxi, this provides you a way to think asymptotically how to remove teleoperators completely. We build a system in which the output of the VLM is not a trajectory. The fast system outputs a trajectory. What are the commands? The slow outputs a script. Think of it as an accompanying adult accompanying a young driver. The accompanying adult doesn't have control of the steering wheel, doesn't have control of throttle and brakes. It talks.

It talks to the young driver and gives him some guidance. So this is what this slow system is doing. It talks. And talking is very natural to language models. It's much more natural than outputting a trajectory. So you see on the right-hand side, it's like filling a table. So action type, static interaction, target box. You see this red, the target box. The passability, you cannot pass or it's no. Then there's action type navigation. You need to do something. What is the direction? Turn right. Command, turn. So it's kind of filling a table, which is very natural for a language model to do. Or in this case, the action type, temporary ignore traffic light. The target box, you have your target box of the traffic light and of the police. And also action type, dynamic interaction. What is the offset?

You don't need to do any offset, but the relation, yield. You need to yield to the policeman. So this is what an accompanying adult does, and this is what this VLSA does. It outputs semantic information that goes into the ACI, into the planning, which receives both the sensing state and the information from the slow system. So just to recap this design, we have the ACI, which we can do massive scale training, way more than you can do with real data or photorealistic simulator. And we have the VLSA, the vision language semantic action model that can act as an accompanying adult and gradually replace teleoperators. So last point, what about supporting end-to-end?

So as I mentioned before, the end-to-end is important as a last stage of training where you want to propagate errors and optimize what really matters, which is the end result, which are the commands. So normally, when people talk about end-to-end, they assume that the entire sequence is differentiable. Networks are differentiable. And here you have non-differentiable elements like the sensing state, like VLSA. But the solution to that is being done in language models. Language models have tools, and these are non-differentiable elements. So they do this end-to-end training using reinforcement learning, and this is something I'm not going to get into, resolution of that, but this is something we also do. So we can support end-to-end even though we have non-differentiable elements along the track. So I hope it wasn't too complicated.

I just wanted to give a sense of going away from the caricature because nobody really does this caricature. Even people talk about it, but nobody really does that. It doesn't really make sense, and you need to design a system, how best to harness the best of modern AI, and I show an example of our design with some innovative elements like the ACI and VLSA, so now I'm going to put this together into how we see the future, how we see the next five years, so I put here the advanced product, the level two plus plus, the Supervision. This is an eyes-on, hands-off, eyes-on system. The challenge here is just cost reduction. The driver is responsible. You don't need to get into MTBFs that are relevant for eyes-off. You need to have a good system that's comfortable, and the driver is responsible.

When the system makes a mistake, the driver can take over. The challenge here is cost reduction. Our first system coming out of 2027, we have now designed a cost reduction of about 40% cost reduction for 2028, and we'll continue this path of cost reduction such that it will become more and more affordable. Next comes the Level 3 product, which we call the Chauffeur. What we see there, the evolution of this line of product is to go from Eyes-off to Mind-off. In an Eyes-off system, the driver is now not responsible in the ODD, say the ODD is highway driving, but the system can ask for the driver to intervene. It's not instantaneous intervention, but you say you have 10 seconds to intervene, and if you don't intervene, the car will stop on the side.

Now, nobody talks about what is the frequency of intervention. Say it's every 10 minutes you ask the driver to intervene. It's not that comfortable. Mind-off says that I can give you a guarantee that I'll not ask you to intervene, so this is where this VLSA comes in. This is where this fast and slow system comes, so in robotaxi, you want to reduce teleoperators in a consumer car, you want to reduce this intervention rate, the intervention frequency, and we call that mind-off. You can call it moving from level three to level four, the same, and the robotaxi, the current status is the system is commercially deployable at small scale. This is the status of the industry today, and where we want to reach in the 2030s is scale. As you mentioned, Christian here talked about 100,000 vehicles by eight years from now.

There are two things. You need cost reduction in terms of sensor set and compute. We believe we developed our imaging radars by thinking long-term that we think that the second generation of robotaxis can rely on cameras and imaging radars alone. Maybe you'll need a front-facing LiDAR and that's it, compared to all the sensors that I mentioned in the first generation. And second, lower the teleoperation to number of vehicle ratio. Asymptotically, you don't want any teleoperators. So this would allow for scale, and this is why we developed this VLSA. So now let's look how all of this goes into our technology stack. So here's the EyeQ7. It's already sampled. It's ready for production. It's called PPAP, ready for production next year, Q3 27. I even have here, I'll do a kind of a Jensen thing and show you.

What you see here is the chip, the EyeQ7 chip. Okay, it has 22 accelerator engines, 12 cores of CPU. It's very, very powerful. To understand the power, rather than talking about tops and things like that, I want to talk about what it can do. First, a reminder about EyeQ6 High, which is coming out with the SuperVision, with the Surround ADAS all in 2027. Compare it to the Jetson AGX, the Orin X, which is a competing chip. We took here two types of workloads, which are very, very relevant to all what we're doing. One is convolutional nets. This is the ResNet-50. Second, the vision transformers. So it's a transformer that accepts images as input and creates a representation by going through the transformer. There are latencies. You want to measure latencies.

So on the right-hand side is the latency that NVIDIA reports on these two benchmarks, so it's not our assumption. It's not our measurement of Nvidia's chips. It's theirs, so you see here that's 0.64 millisecond compared to our 0.5 millisecond on the convolutional net. On the vision transformer, the gap is even bigger. You're looking at a nine million parameter vision transformer, 1.5 millisecond compared to 0.5 millisecond, so it means that our chip is really purpose-built for these types of workloads, convolutional nets, vision transformers. So now, what would these chips, both the EyeQ6 and EyeQ7, do with the VLSA, the VLMs? So with EyeQ6 High, we can run a 3.8 billion VLM at 2.5 hertz. Again, we're talking about the slow system, so two frames per second or so, 3.8 billion, so 3.8 billion parameters is quite a decent size language model.

It can do quite a lot. With EyeQ7, we can run a 15.6 billion parameter. Now, our vision of these VLAs, of these VLSA, that in the world of robotaxi, since we're talking about a slow system and since it's outputting text and not a trajectory, that some of it you can run on the cloud. So we see three layers. The first layer is on chip, and you see what kind of networks we can put on chip. Second layer, talking about a 70 billion parameter network, is running on the cloud. And the cost of that is not big. For a robotaxi, a few hundred dollars per robotaxi, that's way, way, way cheaper than having a teleoperator. So that's running two hertz on the cloud. And then we can have trillion parameter on demand. So it's not that it's running every second.

It's the system asking, just like you ask a teleoperator, "I don't know what to do. Tell me what to do." So now you have a Gemini 3 telling you what to do, okay? Or Gemini 3 like. So the beauty of this slow system, it's not only that you can run it on board, you can also run it on the cloud, which gives a lot, a lot of flexibility. How does it look like in terms of the product portfolio? So again, on the left-hand side is our Supervision to EyeQ6 High. We don't see any need to move to EyeQ7 or so. Here, the purpose is just cost reduction. And EyeQ6 High, the cost is between one-fifth to one-tenth of competing chips. So this gives us a lot of flexibility of reducing cost.

Then we have the Chauffeur, the Level 3 is running on three EyeQ6 and two boards because you need redundancy, hardware redundancy for fail operation. And then the mind-off, the Level 3 is now add EyeQ7 or even EyeQ8, the next chip that we are designing to do the VLSA. It's not something we want to send to the cloud because we're talking about consumer vehicles. And as you saw, EyeQ7, we can run a 15-gigabyte network. It's a very, very, it's a very impressive size of network. And with the same sensor set. So just you add another chip. And this is also convenient because we don't need to revalidate the system. It's the same EyeQ6 with the Level 3. You just add another chip that does the slow route. With the Drive, the Robotaxi, the first generation is four EyeQ6, and one of them is doing the VLSA.

I said there's a 3.8 billion parameter VLSA, and the rest would be in the cloud. Second generation, towards the end of the decade, 2029, there will be an EyeQ7 or EyeQ8 that will do much more on board, less on the cloud, and also reduce the sensor set to just cameras and imaging radars and perhaps one front-facing LiDAR. Okay, so now I'm going into the next aspect of physical AI. So Mobileye is an AI company working in physical AI, but only in one aspect of physical AI, which is autonomous driving. So the difference between physical AI and the AI that most of you, all of you are using, the AI that you are using is in the digital space. Starts in the digital space and ends in the digital space. Autonomous driving or physical AI is that the AI, the decision-making is in the real world.

When you think about what works in the real world, there are two things: cars and robots. Jensen last year called it Physical AI, which I think is a great term for it. Mobileye wants to expand its scope to all aspects of Physical AI because there are a lot of synergies in terms of the technology layers. Both systems, they use fast and slow. Both systems use VLMs or VLAs. Both domains, they use simulation a lot, a lot of simulation and sim-to-real. There's lots of synergy and lots of overlap from the technology part. Now you have a new TAM. You have another growth engine, which is very, very interesting. I would say that the difference between the two, autonomous driving and robotics, is cars operate in a structured world.

Structured doesn't mean it's easy, but it's structured, where the robots operate in an unstructured world. There could be some structured special cases, but in general, it's an unstructured world. Imagine a robot in a home use, in the house. All houses look different. So it's an unstructured environment. And also the number of tasks is open-ended. When you're looking asymptotically, if you want to do whatever a human can do, the amount of tasks are open-ended, whereas the amount of tasks that the car does is really narrow. Narrow doesn't mean it's easy, but it's really, really narrow. So let's look at what Mentee Robotics has developed. First, just simple specs. This is their third-generation robot. It's a human height, 175 centimeters, 72 kilos. It can pick up 25 kilos, which is very important.

If you want to work in fulfillment centers, you need to move stuff which are heavy, so 25 kilos. It has swappable batteries, so it can work 24/7. Just need to swap its main battery while it's active. Also, everything is designed in-house, so it's really a vertical integrated company. All the actuators are designed. The gear is designed in-house. Electronics are designed in-house. All the software is designed in-house. It's running on two Orin X Nvidia chips. And it has a very interesting hand. Maintaining human-like dexterity of hands is very, very challenging in humanoid robots. The kind of default is to use tendons, just like with humans, and then use sensors on the tips, which complicates manufacturing and increases the price.

What Mentee has done, they did rigid links, so you don't need any sensors on the tips because it's connected directly to the motor, so you have kind of a backward feedback. And let's look at the AI capabilities of it. So the first core principle is sim-to-real. You are training, you're doing a reinforcement learning training on a simulator, and then moving to the real world using techniques called sim-to-real. And many demonstrations of humanoid robots that you see out there, it's teleoperated. What I'm going to show you here is no teleoperation. It's all full-stack AI. So let's look at this clip. It's also on their website. You have here two robots working for 18 minutes straight, clearing up all the boxes from one side to the next.

And what you see here in the text is all the internal thinking of the robot, both the computer vision, the navigation, the instruction following. Let's give it another few seconds. So on their website, you have this full 18 minutes, so you see this end to end. And now we're just simply fast forwarding it. So all the boxes from one side, moving to the other side. So this was the first is instruction following. So next clip, there's a full end to end. There's an instruction, understanding of the instruction, navigating, recognizing. So let's have this. Give me another code.

Speaker 4

As you wish.

Amnon Shashua
CEO, Mobileye Global Inc.

So, see on the left-hand side, it has a cognitive map, so it knows where the kitchen is. It's now also, you'll see in a moment, it's recognizing all the objects and putting a green bounding box on the relevant object. And this was part of the instruction of, "Bring me another Coke can," and he showed him the Coke can. And then going back, navigating back to where they were. So it's all kind of end-to-end from instruction, instruction following. And this was not learned or trained by imitation learning. There are kind of out-of-the-box capabilities of pick, move, drop that is common to many industrial settings that the robot comes out ready just by following instruction. Here, now I'm showing something that is teleoperated. The point I want to make here is to show the dexterity of the hands. The hands are very, very interesting.

In terms of the accuracy, now taking a drill and using the drill, moving this from hand to hand. These are hands that are really manufacturable at really low cost because they're not based on tendons. Next, I want to show something that nobody has shown before. When you talk about humanoids, I think there are two stages for humanoids. The first stage, let's call this low-hanging fruit. Again, it's not meaning it's easy. Low-hanging fruit is when you're working in a structured environment. Say you're working in an assembly plant or you're working in a fulfillment center moving boxes or moving items. Normally, a customer would not buy one robot, would buy scores of robots, dozens, hundreds, thousands of robots. The customer would be interested in a set, predefined set of tasks. You can customize your software to the need of the customer.

But now look at the second stage. Second stage, you would like the robot to operate in an unstructured environment like home use. So a home would buy one robot. It doesn't make sense to start customizing a robot for every customer. So now you need the robot to be able to continuously learn, to generalize. Now, you want it to be able to learn from the customer. So the customer would show you a task. So you, the robot, are watching the customer. The customer can talk to you, but also show you the task. And you would like within minutes to be able to repeat that task. So we call this real to sim to real. So the robot is watching, sending the clip to the cloud. In the cloud, we have a foundation model that we built that moves this video into a simulated environment.

Inside the simulation, there is an RL loop, reinforcement learning loop to learn the task, and then it moves back to the robot. This is the sim-to-real part, so real to sim to real, and let's look how this looks, so first, as I observe, what the task is going to be here is to swap batteries, so the human is showing the robot what it's doing, on the right-hand side, on the small inbox, it already has been moved into the simulated environment, but in reality, the clip will be sent to the cloud, and in the cloud, it will create a simulated environment, and now comes the training. Now, the training isn't in simulator, so this is after 100 iterations of training, after 500 iterations of training, after 1,000, so right now, this training takes about three hours.

But by the time we go into production, this will take minutes. Now, this is the real part. So now the robot is performing the task that the human has showed him. And this is why the company is called Mentee, the idea of mentoring. The idea is that human will mentor a human which is a layman human, not a professional, just a customer, showing the robot a new task, and the robot will be able to do it. So where we are in terms of the roadmap, in 2026, start POCs with multiple customers. We have already signed agreements with Amuvio. It's Conti. They changed their name to Amuvio. They'll be our production partner.

2027, we'll start manufacturing the first batch, experimental batch, and to be ready for 2028, commercial deployment for the first stage of robots, which is for fulfillment centers, assembly plants, production plants, which are structured environments, and then towards the end of the decade, 2030, to go into the home use, and this is where this sim-to-real becomes very, very crucial because the robot needs to continuously learn and not just come with predefined capabilities because the predefined capabilities can be useful, but not sufficiently useful in order to cover all the needs of a home, and as I mentioned before, there's a lot of synergies. It makes a lot of sense because there is overlap in the kind of tools that are being used. For example, the sim-to-real, real to sim-to-real is very useful.

Mobileye is also doing sim-to-real, as I mentioned, with the ACI block. We train in a simulator, and then we learn the noise model of our perception and move it back to the real world. So there's often a lot of know-how in this sim-to-real that this is shared among both companies. So whatever Mentee is doing can help Mobileye. Whatever Mobileye is doing can help Mentee. It makes a lot of sense to join forces. And it offers Mobileye a new growth engine, which is a very important growth engine. I'm very optimistic on humanoids. I believe that 10 years from now, there will be millions, millions of robots. There's a labor shortage out there in fulfillment centers. In many of them, the turnaround is 100% per year. It's a very boring task. People get injured. People get bored. And there is labor shortage.

Also, labor shortage in help in home. You can think of elderly home care. There's lots and lots of potential if you have a useful robot. And I believe that the technology, AI, that is moving so rapidly forward, the technology is ready. And just a matter of productizing it and being smart and thoughtful about how you put this technology together into a product. And this would be a magnificent growth engine for Mobileye. So I think this is a very, very exciting new stage for Mobileye. We call this Mobileye 3.0. So 2.0 was after the acquisition of Intel. Now we're going 3.0. As you saw, Robotaxi is on its way, and we have a very good vision of scaling. At the end of the day, scaling is what's going to matter, the ability to scale.

We have the product portfolio of consumer cars, and now a new growth engine. The AI stacks are really exciting, and the entire field is exciting where AI is moving fast forward. So I think I'll end here. You can take some questions if you like. Thank you.

Powered by