Fireside Chat

Jan 10, 2024

Dan Galves

Chief Communications Officer, Mobileye

Good afternoon and welcome everyone. I'm Dan Galves. As I said yesterday, yesterday was the tenth iteration of our CEO Amnon's annual CES presentation. This is the first time we've done a bit of a dig deeper session at CES, and there's really no one better to take care of that for us than Professor Shai Shalev-Shwartz, our Chief Technology Officer. Shai is not only a key architect of everything about Mobileye, but he's also a person that really can put very complex issues into simple terms, which I really appreciate. Before we begin, please note that today's discussion contains forward-looking statements based on the business environment as we currently see it. Such statements involve risks and uncertainties. Please refer to the accompanying presentation, which includes additional information on the specific risk factors that could cause actual results to differ materially. Without further ado, I'd like to welcome Shai to the stage.

Shai Shalev-Shwartz

CTO, Mobileye

Thank you, Dan. So I'm going to talk about a new platform developed at Mobileye called Driving Experience Platform, or DXP for short. About the architecture, why we decided to put this platform in its particular shape, the abstractions that we are using, and the APIs. So the outline of the talk will be as follows: I will start with asking the question, why at all should we care about the platform, and what exactly is a development platform? Then we will deep dive a little bit about previous platforms, how we think is the right way to build this platform. I will talk about the differentiability, scalability, and risk trade-off in building such platforms. Then we will introduce a platform, and at the end, we will wrap up with some backbone of our solution that lies beneath the platform.

So if we think about platforms, they are everywhere. When we use an operating system like Windows, iOS or Android or Linux, it is a development platform. Programming languages are development platforms. There are task-specific developer packages like Spark for working on the cloud, and there are high-level interfaces like ChatGPT, that most of you know, Wix for designing websites, et cetera. So why someone should use a platform? The simple idea is that you don't want to reinvent the wheel. So if you want to write an iOS app, you want to focus on the content and the specifics of your application and not reinvent iOS, the operating system itself. So for the user, it saves time and resources, and for the supplier, it enables scale.

So in order to explain how should we build the platform, let's talk about autonomous driving and how it's built. Usually, in robotics, we are talking about the sense, plan, act methodology. Sense or perception is understanding the surrounding, what's around us. It is where are the cars around us, as well as mapping where are the lanes, who has priority, et cetera. Planning is a decision-making process. We call it driving policy. What would happen if type of reasoning? So if I'm going to yield to someone, is it going to be okay or not? If I'm going to not slow down, is it going to be okay or not? And at the end, after we decided what we want to do, we need to actually do it, and this is the act part or the control.

Executing the plan, moving the steering wheel, hitting the throttle, et cetera. Now, when we talk about a platform, there are three considerations to make. The first one is differentiability. The user of the platform should be able to differentiate its product from other products. So when we are talking with our customers, OEMs, they want to control the experience of the system. They don't want to take a turnkey solution and just take a black box that behaves the same on all car platforms. This is differentiability. The second issue is scalability, more for the development of the platform, in this case, Mobileye. We want that the resources that we invest for every customer grows sublinearly with the number of customers. So we don't want to have a dedicated big group for every OEM, otherwise, we won't be able to scale.

But another factor, which is very, very important as well, is a risk factor. We want that people that are using the platform will reach a real product, and not just playing with the platform and not succeeding in reaching a real product, by using the platform. So based on these considerations, let's think where we put the line between the platform and the user. And let's look again at the perception, plan, act, sense, plan, act methodology. One approach, which has been taken by some suppliers of autonomous driving platforms, is splitting between platform and user somewhere in between or in the middle of perception. Say, giving some APIs for training neural networks, so low-level perception will be supplied by the supplier of the platform, and the user need to work and build the entire perception plan and the driving policy and the control.

The problem with this approach is high risk. We get full differentiability with this approach, but the risk of succeeding is high. Another option on the other part of the spectrum is having the supplier of the platform deal with perception and planning, and leave only the control part to the OEM. The problem with this approach is that there is no differentiability or no scale. So either the OEM gets a package and cannot differentiate its solution from others, or the OEM will come to Mobileye and will ask for this change and that change, and very, very quickly, we will need a big team for every customer. So maybe the best place to put the line between the platform and the user is between the perception task and the driving policy task. This idea was actually coming from customers.

Many customers told us: "Give us just a perception layer, we will do the rest." The idea is that we will do the perception, they will own the driving policy, and by owning the driving policy, they will own the entire look and feel of the car. Is it a good approach? In order to answer it, we must accept that driving policy is also a very complex thing, and the problem with this approach is high risk, and I call it the underestimation plague. If we go back to 2016, there were headlines all around that self-driving is really almost here. Many projects started with a huge optimism that in few years, they can come and bring autonomous driving to the market.

But self-driving is difficult, and maybe the reason for the difficulty is that self-driving combine both the complexity of advanced AI systems with the extremely high precision required for system that required high level of safety. Okay, but this was back in 2016. Today, we have deep learning. Well, we had deep learning also in 2016, but today we have better deep learning. Okay? So maybe deep learning to the rescue.

So it turns out that even the most modern deep learning systems, like graph neural networks, transformers, BEVFormers, and all the new cool kid in the town, still make unintuitive errors, are still bad at edge cases, still struggle with planning, and there is no single evidence of a modern AI system that reaches the accuracy of 99 point enough nines after the dot, which is the level of accuracy we need from safety-critical systems such as autonomous driving. And maybe the reason is all of these approaches are statistical approaches. They are built on statistics, and the problem with statistics is that the tail is very, very difficult. So the problem with putting the line between perception and planning is that we still have high risk, because driving policy is also hard.

And I will explain later that you need to deal with predictions, intentions, uncertainties, risks of decision-making errors, efficiency of planning. All of this is really hard, and it's, it is also not scalable. Why? Because perception is never perfect, especially when we are talking about consumer-level autonomy, where we must have compromises on the price of the system. So driving policy must have intimate integration with the perception layer. If the perception is changed, driving policy must be adapted. And then putting the line between supplier of the platform and user of the platform in a place that require delicate integration is not a good idea. It's a recipe for problems. Okay, so if this is so hard, what are we going to do? So let's go back and ask ourselves how to design a good self-driving platform.

We want to enable differentiation while minimizing the risk and enabling scalability. These are the three things that we wanted before. And the idea is to hide universal content, because there is nothing to differentiate in universal content, and focus on the unique content. And now the main art is to find the right granularity of abstractions that enable to do this separation between universal and unique content. Okay, so let's look again at this picture, and look below at the graph of universal versus unique. On the left-hand side, we have things that are clearly universal, like the perception stack. The perception stack is universal. Everybody wants to know where are the cars, where are the pedestrians, where are the lanes. There is nothing here which is specific to some OEM. On the right-hand side, there are things that are clearly unique.

The control is unique to the car platform. The HMI is unique to the car platform. The question is, and this is the elephant in the room, what to do with driving policy? On one hand, driving policy is difficult and we want consolidation of efforts. But on the other hand, driving policy is the main responsibility for the look and feel of the driving behavior of the car. So we need to be more delicate and split the right split between universal and unique. So, well, for universal, let's see what is for sure universal. Facts are for sure universal, okay? So where are the road users, hazards, traffic lights, all of this, there is nothing to differentiate. Uncertainties, it's also very important. Usually, it is in the driving policy part, but it's also universal.

So you need to know not only what you know, but also what you don't know. If you have lack of visibility, you need to know it. If you have occlusions, if you have error bars on your estimators. Another thing which is universal is semi-facts. What are semi-facts? Semi-facts is, are things like predicting the future, predicting what are the intentions of other road users. For example, there is a car which is standing. Is it a parked car, double parking, or is it a car that is standing in a traffic jam? The behavior will be very different between these two options, but the kinematic state of the car is exactly the same. It is standstill. Okay, likewise, is a car intend to park, to perform a cut in, to perform a U-turn?

All of these uncertainties are universal. You want the best in understanding of the intentions of other road users, no matter how you want to differentiate the product. Likewise, we invested a lot in optimization engines. You want everything to be very, very optimized in order to run efficiently on the car. So efficient data structures and optimization engines are also universal part. On the other side, on the right-hand side, the unique part is what to do with all of this information on, the kinematic state of other cars and the uncertainties. Decisions like discrete driving decision. Do you want to perform now a lane change or not? Here you can do something which differentiates between different platforms. One OEM will want more lane changes, more sporty type of, driving. Another OEM will want something more calm. It depends on the customers.

Overtake or stay behind, and other discrete driving decisions. Likewise, continuous longitudinal planning. How exactly you want to accelerate: faster, milder? How we want to brake: in advance or at the last minute? All of these are things that enable differentiation. Lateral planning, and of course, control and HMI. So this is a right split between universal and unique, and we need to put the difference between the universal part of the platform and the unique part of the platform in the right place. So now that we understand it, we need to ask ourselves how we do it. And the triplet answer is when, what, how. Okay? So when and what are universal, and how is unique. Let's take an example. When we approach a stop sign, what we need to do? We need to brake to reach full stop. This is universal.

Every OEM will want to brake to full stop at a stop sign. There is no differentiating. Where is the differentiation? In the how. What is the exact braking profile on how we want to stop? Do we want to stop later and stronger, or do we want to stop in advance and milder? Likewise, when we approach a roundabout, we need to yield or give way to other cars in the roundabout. That's for sure. Now, assuming it is safe to take both of these decisions, should we yield or should we give way? This is a how, and it depends on a yield logic that can be owned by the OEM, and one OEM will be more assertive, will want to be a behavior which is more assertive. Another one will want to be more mild. Okay? So here is a bipartite graph.

On the left-hand side, we see when. On the right-hand side, we see what, and these are several examples of when and what from the platform. Now, as you can see, all of this is universal. When you approach a speed bump, you want to slow down, okay? This is universal. There is no differentiation. The only question is how much, how fast, how exactly? Here you should. In the how you need to differentiate, but in the when and what, this is universal. So how it works: we take some scenario, what, and, and what we want to do in this scenario, brake to stop, and then we, we enable the OEM to define packages or families of how, several implementation of how to brake to stop. From this, we can derive specific instances.

So I will briefly explain how is the experience of working with DXP. There are two steps. The first step, which is done in advance, is that the user, the OEM, constructs packages of instances out of the platform's families of what the platform enabled to do. So the platform provides offline and online tools for creating these packages and see how they work in simulator or in online injection, recording tools, and then the OEM can find these packages that he likes. Very importantly, in order to minimize risk, the platform provides reference design to all of the required packages. So the user don't need to, doesn't need to implement from day one all of these packages. He can focus on the specific instances that he wants to change.

So he starts with something working from day one, and then can focus on one, what are the most important things to differentiate? And depending on the time until SOP, he can choose more or less to optimize. Then, the user creates code that, during online drive, select the packages based on application parameters like locality, road types, regulation, driving modes, weather condition, et cetera. This approach solves the differentiability, scalability, risk trade-off. Of course, we get differentiability. The user of the platform have full control on the unique content of the driving experience. We get scalability because this abstraction doesn't put a line in places where integration is a big risk. It puts a line in places where the integration is seamless. And in addition, we provide reference design, so the user can also look at our reference design and find out if he's doing something wrong.

Also, in terms of risk, from day one, the user gets a reference design, something that works out of the box from day one of the project, of the implementation of the project. So there is no risk of not converging in time to production. The only risk is whether we succeeded to do all the differentiation that we wanted, or maybe only subset of the differentiation with that we wanted, but project will succeed for sure. This is just examples of code, just to show you that it's not just PowerPoint, it's really working in the car. I will not go over the code, but just to let you know that you take some what, like brake to stop, and you define the scenarios, the when, that apply to this what.

For example, for brake to stop, you have, you have traffic light, red light, you have traffic light, right on red, yield with blinking red, et cetera, stop sign, end of path, because we are going to hit road edge and many other cases, bottleneck with oncoming, very narrow street where we want to stop because there is oncoming car. All of these are instances of braking to stop. And this is an example, very simplistic code, of what will happen in online. In online, the customer needs to write the code that chooses the appropriate package of how to brake to stop at that moment, and it can depend on country code, it can depend on road type, it can depend on weather condition and HMI items like driving mode.

I will show you a short movie showing two completely different behaviors of the car, that both of them were constructed by DXP. So just by writing this DXP code on the same backbone of the platform, we can achieve really different behavior of the car. On the left-hand side, you will see a more aggressive style of driving, and on the right-hand side, you will see, or maybe the other way around, you will see a more mild style. Let's see if you will manage to see the difference between the sides, because I don't remember. Okay, so which side is the more aggressive? You got it. Okay, so the last part of the talk, I will touch a little bit about the engine behind the platform.

So what are the ingredients that enable us to build the perception layer, as in, and the driving policy layer? How to build a capable driving system? So the first thing, we said it back in 2017 in a scientific paper, we must separate driving policy from perception, as opposed to an end-to-end system, and there are many good reasons for that, and Amnon talked about it in yesterday in more detail. So the perception, for the perception, the basic methodology that we are using is redundancy. So we build a modular design, and gradually we add more and more redundant layer. Therefore, SuperVision system already can drive everywhere, but maybe not accuracy enough for eyes-off system, and gradually we are adding more layers to enable full stack.

On the other side, driving policy, here the idea is the RSS model, the Responsibility-Sensitive Safety model, which I will talk about it a bit, and intentions versus predictions. So let's start with the perception stack. The idea is redundancy, and here are four axes of redundancy. One redundancy is in the sensor set, camera versus radar versus lidar. Another axis is the composable versus end-to-end approaches, and also Amnon talked about it here yesterday. Another layer is appearance versus geometry approaches. So appearance is what we see in terms of the semantic meaning of what we see in an image. So we see something when we say we know to name it. Its name is a car, and because this is a car, we know what to do with it.

Geometric approach means that we don't know what it is, we just know that it's above the road surface, so we shouldn't hit anything which is not flat on the road, okay? So this is another type of thinking, and every type of thinking has its own advantages and disadvantages, and this redundancy approach enable us to enjoy the benefits of all worlds, and at the end, learning versus model-based approaches. So very, very quickly, I'm going to show you for the problem of vehicle detection, many points on these four axes of redundancy. So here is camera, learning, the composable appearance-based, and then camera learning end-to-end appearance-based. And then camera model-based, geometry-based, and then camera learning-based, geometry-based, and camera learning end-to-end appearance-based, and lidar model composable geometry and learn. Yeah, you got the picture.

We are not building a single system. We build plenty of systems, and then we utilize the benefits, the advantages of all of these systems, because there is no single system which rules them all. Every system has its own benefits and failures, and by building all of them, we can smartly choose between the different approaches. The last part is why driving policy is difficult. In driving policy, unlike the sensing part, there is no ground truth. There is no single right answer, whether I need to yield or can I take the right of way? In addition, actions that are performed now have long-term effect on the future, so we might, we might have butterfly effects. Everything is working in closed loop, so we are also affecting the environment by our own actions, and we must handle uncertainties about the futures.

We must reason about what others might do and what is not reasonable that others will do, and act accordingly. This approach, this driving policy problem, and these problems also lead to computational challenges because we must plan for sufficiently long time, otherwise, we might find ourselves in a situation where things looked fine before, but then now they are looking very, very bad. So for example, this car here is driving 10 meters per second. If we look only two seconds into the future, then we see that everything is clear, because the other car that is standstill is 20 meters away. So in two seconds, we see nothing, no problem. It's great to continue and do the maneuver right. But of course, it's a horrible idea to do it because we will find ourselves hitting the other car.

So we must plan for a sufficiently long time. But then, because of the butterfly effect, we need to reason about an evolving future. So what is Mobileye's approach to these driving policy challenges? The idea is RSS plus analytical calculations plus intentions. Okay, so quickly going over the main ideas. RSS is an assumed guarantee type of methodology. This is something that is used in safety systems like aviation. The idea is that you want to assume the worst case, under a well-defined set of reasonable assumptions. And then, under these reasonable assumptions, every bad thing that might happen, will happen, okay? So we need to plan for the worst case. And then, how can we do this? Because there are infinite possible futures. Here comes the analytical calculations. It is impossible to do it in a numeric way.

So what we are doing is, that we are coupling all the future into the present using analytical calculations. This idea of coupling the future into the present is very similar to dynamic programming methods, which are popular in driving policy and planning, problems, okay? But unlike dynamic programming methods, which requires predictions of other agents, for our method, we are not using prediction, but we are using intentions of what others might do. And this results a more human-like behavior. Think as an example of a pedestrian standing on the side of the road, okay? Predictions mean that I know exactly if it is going to cross the road. I know exactly the trajectory of how exactly it's going to cross the road, at what speed, and exactly the curve that he's going to do.

Intentions, on the other hand, ask a single question: Does he intend to cross or not? If he intends to cross, I need to take the worst case of trajectories that he might do and be ready for it. If he's not going to cross, then fine, I can continue. So it's a much simpler requirement from the AI that needs to do the postulation or the prediction of what the intentions of the road user. And we use all the modern AI tools as well as model-based tools in order to construct these predictions. Okay, so just to sum up, the comparison of Mobileye's approach for driving policy to other approaches.

So in the columns, we, you see Mobileye approach, you see another popular approach, which is called Monte Carlo Tree Search, which, which, came to, great popularity in the works on, using AI for solving games like chess, like Go. There is a dynamic programming on Markov decision processes or linear quadratic regulator, and end-to-end learning that now also are very, very populated, just gives the AI beast all the information and let it rule out what to do. So when we are judging these methods, we need to look at several consideration. Transparency: Do we understand what the system is doing? Controllability: Can we change the behavior of the system? Which, of course, is very, very important for OEMs in order to own the product. Performance: Is it efficient approach, or does it require a massive amount of computation?

Sorry, this is efficiency. Performance is how good the system is working, and can we guarantee that it will, it always work. And the only approach that has a V in all of these is Mobileye's approach, and this is why we chose this approach. So to wrap up, Mobileye DXP, this platform makes a separation between universal content and unique content via the when, what, how abstraction. It solves the expressivity, scalability, risk trade-off. It enables both expressivity, enabling the OEMs to differentiate, enable Mobileye to scale up, and leads to a product that will work, to project that will succeed. The main ingredients of the platform's backbone are the redundancy as a key component for perception, and driving policy using RSS plus analytical calculation, plus intentions. Thank you very much.