So welcome to our third IR event on the topic of digitalization. Looking back to 2020, when we first held our event on this topic, I think really a lot has moved, and we have seen a further acceleration, especially on the topics of generative AI, machine learning, or large language models. Can I have the next slide, please? Sorry, do it myself. Okay. Let me quickly take you through today's agenda. We have 5 speakers today with us. The first one will be Alan Hippe, our Chief Financial and Information Officer. Alan will talk about our ongoing investments in informatics and how we drive integrated data platforms and the infrastructure. He will also present the first two use cases here, Galileo and Aspire.
Our second speaker for today will be Moritz Hartmann, Moritz Hartmann, Head of Roche Information Solutions, which is located within Roche Diagnostics. Moritz will talk about how our increasing commercial portfolio enables laboratory efficiencies and advanced clinical decision support. The third speaker for today, also from the Diagnostics division, will be Kent Kost, our Global Head of Diagnostics Operations. Kent will present a couple of more recent examples on how AI machine learning-based algorithms start to improve our manufacturing and distribution capabilities. And after that, we will have two very exciting presentations focusing on how the AI machine learning-based tools transform our early drug development, both at gRED and pRED. So the fourth speaker for today is Aviv Regev, a well-known computational biologist who joined Genentech back in 2020 from the MIT, and she is now leading gRED.
Aviv will present the gRED Lab in a Loop concept, providing you with many examples on how computational approaches start to make inroads into every aspect of early drug development. Finally, we'll be joined by Scott Oloff, our Global Head of Data and Analytics at pRED. Scott's presentation will include examples from clinical drug development, especially to call out here the digital biomarkers for neurological diseases and deep learning algorithms in ophthalmology. The total time for the presentations is set for 85 minutes, and then we will have a 35 minutes Q&A session. Can I have the next slide, please? As mentioned before, like in previous years, we have picked 20 use cases really throughout the entire value chains of both our traditional businesses, pharma and diagnostics.
This has to be proven in the past to be a very efficient way to really cover this very broad and complex topic. Looking at the use cases we have picked this time, I think we really got a good coverage of the entire value chains. This time, we might more have a focus here on the preclinical and early clinical drug development part, with examples spanning novel drug target identification, identifying new indications for drug currently in development, or optimized molecule designs for small molecules, antibodies, or cancer vaccines. These are just a few examples. Some listeners will also notice that we provide today updates on use cases which we have presented previously back in 2020 and 2021.
This holds especially true for the increasing product offerings in diagnostics, now organized under the Roche Information Solutions, where we have bundled some of our efforts and opened also a marketplace for healthcare algorithms, which are open to third-party developers. And finally, I would like to mention here something, rather new, which we have not really had the opportunity to talk about. We are also currently in the process of building our own in-house large language model infrastructure, a topic that will be covered both by, Alan and by Aviv. On this slide, I just wanted to summarize on a very high level of what we hope to achieve with all these initiatives ongoing and in digitalization.
Probably, this picture is not complete, but as you can see here, going from the left to the right, you really see here on the starting point in early research and development, it's about gaining new insights in basic biology and human disease, identifying new drug targets and new drugs to be designed. And then if you move on, you see here it's about improving speed to market and better health economic assessments. On the manufacturing and distribution side, it's about producing ever more individualized drugs at an affordable price, and also then to improve global access. On the diagnostic side here, shown in green, you see basically it's about about improving diagnosis and disease prevention, but also improving treatment for patients by building more holistic solutions and by also supporting physicians and hospitals.
But also here is a nice feedback loop, as you can see. These improved diagnostics allow us to gain additional insights in human disease, and this nicely feeds back to the beginning of our journey. On the next slide, this is just a slide I really had to sneak in before we kick off the event, because it really raised the question, which probably is on the mind of many of our call participants and might come up later in the Q&A. When the IR team was trying to generate a cover picture for today's event using OpenAI's DALL-E program, we really tried initially this... You see this on the left side, we tried to for a protein flying over a circuit board.
But however we phrased this challenge, DALL-E was really not able to generate an image which suited our needs. Admittedly, as you can see here on the left, there were quite some creative and really funny solutions. For example, I really enjoyed, what looks like a mesh of a protein chain with beta folds organized in the shape of a circuit board, or the word protein or a similar word just written on, on something which resembles a circuit board. In the end, it really turned out that the generative AI was unable to deliver any picture which only loosely resembled a fictional protein structure. However, then when we changed the request and asked for a DNA helix flying over a circuit board, we got thousands of useful pictures within a few seconds.
So making this observation, but also watching recent AI successes, such as predicting protein structures or ChatGPT, the question came up: Where do we really stand, and what is there to come in the next few years? And with that, actually, with closing remark, I would like to hand over to Alan here to kick off the event. Alan, please.
Yeah, Bruno, thanks. Thanks for the nice hand over, the nice introduction, and certainly, I think it's great to see how you apply AI in investor relations, yeah, to facilitate, yeah, how we bring the presentation along. That's great to see. I think what your introduction also explained is that I'm surrounded by superstars, really with Moritz, with Kent, with Scott, and certainly with Aviv. So I know that nobody's really waiting for me here, so that's why I have 10 minutes and just 10 minutes, yeah. I'm promising that I will be crisp. We thought it makes sense to give a little bit of a view on what Informatics does in Roche, because if you like, we are generating the backbone for everything that you will hear about today.
And it's a great pleasure to lead this organization. It's not really without effort and challenge, but certainly, it's great to be in the middle of great technologies. What do we do in Informatics overall? We drive platforms end-to-end in the company, and we drive integrated data end-to-end, yeah, in the company. That's what we're going for, based certainly on great technology and great knowledge, and, hopefully, yeah, that is brought along by the presentation. When I go to my first slide, you see really, okay, the technology trends driving the next wave of opportunities, and I don't want to go through them because you know them, yeah? I think that's pretty clear, and you see the current digital waves that we're going through.
I think we're all aware that these technologies have a huge potential to drive impact. And certainly, what is on everybody's mind at the moment is generative AI, and you see today, out of the 20 cases, 7, yeah, are focused on that topic. You see that on the right-hand side of the slide. And you also see the other opportunities in healthcare, where we think we can make a difference with digital, and I think we covered the field pretty well with what we want to present today. Let me also say what I find really great about the presentation we're giving is we're focused on use cases. It's not like that we give you a presentation about technologies in general. We really tell you what we're doing with it, yeah, and where we think the benefit lies.
I think really when we look at the industry as a whole, where do we stand, and where does pharma stand? And I think perhaps worth to take a quick look. You see on the left-hand side, really an analysis that we have that I've also presented in 2021. 2021, I spoke about that pharma's well behind. Yeah? And what you're seeing now in 2022, when you look at the digital acceleration index, is that healthcare and pharma is speeding up. You see really the second highest increase of the digital acceleration index by industry now, so really, and really with a high acceleration rate. And what is interesting is this analysis also shows that this has quite a profit impact, a positive profit impact.
So, evidently now healthcare is moving into the right direction, and as we all know, we have the funds available. I think if, if we move in a certain direction and we're decisive, I think we can make great strides, and I think that's what's happening now. On the right-hand side, you see really what has happened with funding in artificial intelligence over the last couple of years. I think we're all not surprised, yeah, that this has increased significantly, so lots of funds available. We will see how sustainable that is in the current environment, but, at least we can say a lot of funds have been applied in the industry, so I think we can all expect great outcomes here.
I think when we now ask ourselves, "How is Roche tackling all of this and tackling these opportunities?" And you see that slide 11, and I would say there are four messages coming from that slide. First is, I think we have a global footprint, yeah, with Informatics, and wherever we have a major site, whether it be in San Francisco or in Basel or wherever, I think we have that co-location piece, and we are well connected to our key sites. And then we have also some additional Informatics hubs around the world, and it might be driven by knowledge, might be driven by cost. I think that's certainly then the complementing element. Which I'm still s omething I have to digest, is, well, we are spending more than CHF 3 billion on digital per year, and the Informatics organization itself is spending more than CHF 2 billion per year.
These are massive numbers, and that, I think, underlines and substantiates how serious we are, in investing in digital. 23% of what we're doing, of the Informatics budget, goes to software and cloud. We want to increase that over time. Yeah, we want to bring the running costs down and invest more into the innovative stuff. I think we have made major strides, but there's more to do. And then certainly we have 4,400 internal employees in Informatics and a multiple of external ones. Good. Let me talk a quick moment about where we come from.
Yeah, and when I gave this presentation recently, I got a lot of feedback about that slide because when you look at the left-hand side, where we're coming from as Informatics, it's really a pretty siloed structure. We had four separate units in the past. There was Pharma Informatics, Diagnostics Informatics, Group Functions Informatics, and Global Infrastructure and Solutions. And what we're doing now, and this is the transformation we're right in at the moment that we're doing, is that we're getting divisional agnostic. So we really bring Informatics together, and as said, what we want to drive is not just the global infrastructure. That's what we have done in the past already, where we really standardized a lot, and really connected it better.
But now we want to drive the global platforms end-to-end, yeah, also from a process point of view, from a data point of view. And certainly, with that, we want to create data lakes, global data lakes, that can be mined and can be analyzed in a much better way compared to the past. And I think really we want to make strides from finance to R&D here, and I think there is a great opportunity for us ahead. Even 2024, we will do more here. Certainly, that comes along also with the topic of decommissioning and making our infrastructure and our whole setup much less complex. Do we have partners? Certainly, we have partners we collaborate with, and here are the ones that are more on the innovative side, if you like. Yeah, that drive us.
Certainly, we work with a lot more peer partners, yeah, very clearly. I think recently, NVIDIA, I'm sure that Aviv will talk about NVIDIA when it comes to drug discovery. Prescient is something, yeah, that we have brought up, and the nice collaboration here between gRED and Group Informatics. So I think that's something we're all benefiting from. But certainly, we couldn't drive innovation in this company without bringing in external innovation. Good. Is that all showing? Is that something which really makes a difference for the organization? And certainly, I would say yes, but I might be biased.
When we look really at external assessments, and you have two here on that slide, you see on the left-hand side the Pharma AI Readiness Index, which comes from CB Insights, external, and they have ranked us number one, yeah, in the Pharma AI Readiness. They do that based on the assessment of three dimensions: talent, execution, and innovation. Talent is certainly the ability to attract AI talent and to retain them. You see execution, yeah, really the ability to bring AI-powered products and services to market, and you will hear about that, which I think certainly is a key element we're in business. Then I think certainly about innovation, where we have really a very high score, and that's really the track record of developing or acquiring novel AI capabilities.
So I think with quite some pride, I think I'm happy to show that assessment. On the right-hand side, you show something which at least is equally important. Genentech, yeah, which is a major part of our informatics organization, has been ranked number one and best place to work in IT by Computerworld, which I think certainly we take pride in, and very clearly it's testament to the exceptional talent, yeah, and the collective expertise of our integrated informatics team on a worldwide basis. So I think we feel encouraged. So let me go to the use cases. And I have two, and want to be brief about that. I think really one is about generative AI and how we use that in Group. You will hear a lot of use cases and specific use cases today.
And then certainly, I think our major investment into an ERP project, enterprise resource planning, which is called Aspire. So let me start with Galileo. And Galileo is really a program that we've established to drive forward our next-generation AI strategy and deliver on the best high-value AI opportunities, and we base that on four pillars. Yeah, let me go through them. I think certainly we want to support the use cases, which provides the highest value, to Roche and to patients. I think that's the first one. And the next one is this next one, the second one, is establishing a next-generation AI platform that allows us to build scalable AI applications, if you like, on a group-wide level.
So really, you can tap into different models, I will come to them, and then you can apply the models that work best for you in your workplace, if you like. That's what we would like to achieve because, well, let's face it, I think I see it as tools, yeah, which really create major productivity if applied well, and we would like to give everybody in our organization the opportunity to do this. Third point is we prepare the organization to be AI-ready by upskilling our workforce, and that, as said, applies to everybody. You can get in contact with the Galileo team, and you can ask for advice, and you can get some really usable and tangible advice, how you can really improve your workplace and your outcomes, yeah, by applying AI.
Then the last point, a little bit overlooked, the first one, is ensure we understand and use AI responsibly. I think we are in an industry with very high ethical standards, and we at Roche, we adhere to the highest ethical standards. And very clearly, I think when it comes to governance, principles, practices for best support, when it comes to ethical, trusted platforms and solutions, I think certainly we would like to guide people to get there and help them that they can adhere to these standards. So how do we do that? Very quickly, I think really you see on the bottom of the slide, we start with solid data and an infrastructure as the foundation.
You see really cloud computing certainly plays a major role here, and I, I can tell you our cloud cost next year, 2024, will go up quite significantly. And then on top of that, we are iteratively building the Galileo operating system, so to say, that allows our developers to build AI-powered applications at a faster pace. Three parts to it, as you can see. I think on one hand, the Model Marketplace, which gives access to commercial, public, and internally developed models. Yeah, and you see really the list here. Then we have the Retrieval-Augmented Generation engine, that provides connectors to load data, create embeddings that can be used to ground the model's output.
And then, I think last but not least, the responsible AI tools, as mentioned already, that to regulate AI applications and allow Roche to be compliant with external regulations and internal policies. In terms of generative AI applications, we have rolled out Roche Chat. So we have a ChatGPT, which is in-house, that allows Roche users to access this technology in a secure manner, and certainly with full control about the data. Yeah, and I think that's a, that's a key element to have. So we have our own Roche GPT. And similarly, we have rolled out the Developer Copilot that can write code, test cases, and documentation to improve the developer productivity. I think that's my first use case. I hope you got a little bit of a feel how we approach it.
Now let me switch over to Aspire. Aspire, that's just huge. I think, let me say, this is not just an IT project, an informatics project. This is a whole business transformation program, which is running for quite a while now. Now, 2024 and 2025 will be the years where we have the major deployments around the world. You see on the right-hand side, the value chain processes and the enabling processes that basically we standardize and optimize in the company as a whole. These are backbone processes. When we started the project, we had 113 different processes in the company to pay an invoice. I think nowadays we have three.... and that's also thanks to this program, yeah? And I think there is so much more going to manufacturing, going to other areas.
But what a Herculean effort it is. It demonstrates the left-hand side, where you see we first. It's one of the largest S/4HANA projects worldwide. We have 500 colleagues involved. We will deploy in 200 legal entities. The investment in total is over CHF 2 billion, and certainly, basically every user in Roche will be affected, yeah, by this program. A little bit more in detail, but don't wanna spend too much time on that. I think you see the core of Aspire, yeah, that we bring in, and certainly that is on one hand, the process optimization, the other piece is really the technology. But what we can do is we can bring many more technologies on top.
And certainly, this will help us to apply artificial intelligence, but also other technologies, yeah, to help us, yeah, to get more productive and to make Roche a less complex company that can apply complexity in other areas much better. Good. With that, I'm over time, yeah, and I'm happy to hand over to Moritz.
Thank you very much, Alan, for, yeah, kicking us off with these two essential use cases. Also from my side, a very warm welcome. I'm Moritz Hartmann, the Global Head for Roche Information Solutions or RIS, and I'm very pleased by the opportunity to present to you today the direction we're taking with our insights business. So I'll be focusing, as you can see here, more on the commercialization side and really speak about the use cases that we are as well commercializing. And before I go into the details of this RIS business, I would like to share with you a little about the bigger picture of the insights business at Roche.
As you know, we have a very strong footprint in pharma and diagnostics, and we believe that at the intersection of the two is where lies the power of digital insights. We brand that here, as you can see in that picture in the center, Navify, which is a combination about navifying the data and actually verifying for decision support. With our broad healthcare expertise, we're in a leading position to bring healthcare insights solutions to our customers that are truly generating value for them, whether that's in the lab, whether that's in a clinical setting, to healthcare systems as a whole, or ultimately to patients. How does that look in practice now? At Roche Information Solutions, we focus on four main segments: laboratory insights, then closely intertwined clinical workflow, optimization and clinical decision support, as well as remote patient monitoring.
Those segments are interlinked and build on each other. In lab insights and with our position in diagnostics, this is very close to our core business, and here we want to become the digitalization partner of choice for our in vitro diagnostics customers. Our broad portfolio and already existing large footprint will allow us to connect all diagnostics disciplines in workflow systems and insight solutions. In the clinical space, we have the ambition to shape clinical workflows and lead the clinical decision support market. We believe we're uniquely placed to do that because we can derive insights from all the diagnostics disciplines and really provide multimodal decision support that we can as well then expand into modalities beyond diagnostics. Lastly, we also aim to support remote patient management to help patients, clinicians, and health systems as they move to a more decentralized approach to care.
Adding remote monitoring solutions to our offering, we can cover the entire health journey, healthcare journey, spanning from the lab through the clinical setting and the home setting, where we can provide a 360-degree view of the patient's journey. Across all of these settings, we have three distinct value propositions for our customers. First of all, medical insights. This is, of course, as a healthcare company, what our customers expect from us and where our expertise lies. Ultimately, we want to help our customers to make confident decisions and impact the care that they provide to patients. In addition to that, we're offering workflow solutions to improve the way that healthcare providers operate, be it in the lab or in the hospital.
This is not only important from a financial point of view, but it's also the basis for a smoother and more integrated patient experience. Lastly, none of the two value proposition can really work without the necessary infrastructure in place, and this is a need that we see frequently of our customers, and providing that and recognizing that as a fulfillable and also commercial need really helps us to not just set our customers up for success, but also drives then the adoption of our operational and medical insight solutions. All of these value propositions are reflected in the way how we have built our portfolio. As you can see, we already have quite a comprehensive portfolio in market or at late stage and across these value propositions.
Our most mature products can be actually found in the lab setting, but we're rapidly expanding into the clinical setting, and we have also first successes with our products in the home setting... First, I would like to talk a bit about a product we call the NAVIFY Integrator, that really forms the basis of our entire portfolio. As mentioned earlier, to derive insights, we need to make sure the data can flow and be accessed for, by, and for our products. In that sense, the integrator is really the Wi-Fi of our portfolio. You know this from your daily smartphone use. You cannot download or use any of your apps if you're not connected to this Wi-Fi, and this is exactly what the NAVIFY Integrator does.
In addition, it provides customers a gateway to our portfolio through the NAVIFY Portal, and this marketplace is like an app store where our own and as well third-party applications can be accessed through. So that is the NAVIFY Marketplace. The NAVIFY Portal actually allows our customers to then access their apps, and it functions like a display on a smartphone, where you can really organize as well your applications and your workplace in a way how it best serves you and, and how you, as an individual user, best like to be set up for the use of our solutions. Now, through this portal then, you access our entire portfolio of applications, whether that's operational excellence or a medical value portfolio, and it includes as well third-party solutions.
Now, let's move to the operational excellence portfolio that today specifically focuses around the laboratory. What you see here on the slide is really the flow of a blood sample through an entire lab and through its process as it goes through from sample retrieval to a result that is delivered to a healthcare professional for decision-making. Our digital solutions covers the entire journey with five different products and some of which I'd like to highlight here. Starting with sample tracking, it's important to note that there is about 0.5% of all tests that have an error today.
Now, 0.5% sounds really small, but if you consider that there is only by our customers 25 billion tests are being made, and it's globally more than 25 billion samples that are being processed, that is actually amounting to roughly 100 million errors a year in the lab. And that translates to healthcare costs of $15 billion. Two-thirds of all these errors have their root cause before the sample even reaches the laboratory, and that is what we're tackling with the NAVIFY Sample Tracking that allows our customers to optimize the pre-analytic and analytic operations from the sample collection to transportation and the reception in the lab. There's already great solutions out there, and often these are as well very locally tied to systems, for example, transportation systems.
So we have built this solution in form of an open API solution that also connects all of these existing offerings, but really has as a differentiation the integration in the lab process, so that when the samples arrive in the lab, the sample and the patient is already known to the lab and to the respective process. With inventory management and monitoring as well as control, these are different solutions that help to address some specific needs, as their names indicate, to control the manufacturing process. But I would really like to go here to NAVIFY Analytics because this is a true insights product, and it takes data from across the lab and provides lab managers with an easy way to identify operational trends and challenges.
Some of our customers tell us that they actually weren't even aware of issues until our analytics software outlined them to them, and they have a huge impact with just sometimes small adjustments in the lab process. With this product, we become a true digitalization partner to our labs and to help them harness the power of insights and to make a difference in their respective environments. Let's move now into the clinical setting and our offering called the NAVIFY Algorithm Suite. The Algorithm Suite is a one-stop-shop product for medical algorithms. This platform can be seamlessly integrated into the hospital's EMR system and provides healthcare professionals with a library of verified Roche algorithms, as well as algorithms from partners to use in their everyday practice. We are constantly... Ah, this is a different slide order.
This is what you can see here is our algorithm menu that we're constantly expanding and that you can see is a real mix of both Roche and third-party algorithms. Our main focus today has been on oncology and cardiology, obviously two areas that we are particularly strong in, but we're continuing to grow and expand as well in particular in the area of chronic diseases, including kidney, as well as infectious diseases. A little further down the line, we will also have panels for women's health and neurology. Of great interest to us is as well that some of the algorithms have been developed using machine learning technologies such as Colon Flag.
This particular algorithm has the additional benefit that that it also applies machine learning at an individual patient level. So that the longer the algorithm knows the patient, the more accurate it actually becomes. I would also like to deep dive on another algorithm example that clearly demonstrates the way how Roche products interact with each other, as showed at the beginning of the presentation, and this is the GALAD algorithm. The GALAD algorithm supports the diagnosis of early-stage hepatocellular carcinoma or HCC. HCC often doesn't show symptoms until it's in advanced stage, and it's therefore the majority of the HCC cases are diagnosed only in an, in this advanced stage. Survival rate at this stage is very poor and less than, at less than 5% five-year survival.
While in the early stages, five-year survival rates are up to 70%. There's a strong need to detect patients early in order to improve their outcomes, and regular surveillance can help with that. The international guidelines recommend testing for HCC every six months in risk groups using a combination of ultrasound and serum AFP. Ultrasound may miss more than half of the early-stage HCCs, and therefore, other methods of diagnosis, such as CT scans or MRIs, are used and are used when available. The GALAD score, on the other side, can easily calculate based on patient demographics and the measurement of blood-based tumor markers using a very small sample of a blood draw.
So the GALAD score is a very practical and effective surveillance test, especially where ultrasonography and equipment and trained radiologists are scarce or expensive. It is used. Its use can lead to improve, to improving the effectiveness of cancer control programs and, delivers a great value to patients as their potential early detection dramatically increases the outcome of their therapy. I'd also like to give a brief outcome on the latest expansion of our portfolio, remote patient monitoring, that particularly addresses the fact that healthcare systems are moving increasingly to provide care in the home setting. Here, we are currently exploring a multi-disease solution that allows secure and scalable that serves as a secure and scalable platform and that enables care for patients wherever they are.
These solutions aim to work on patient engagement, for example, through care plans or symptom tracking, through care team workflow optimization, when it comes to, for example, prioritizing patients, or detecting longitudinal trends, and it also allows to provide actionable and on-demand care pathways, for example, for risk prediction or treatment optimization. All of this will be enabled through data that is connected both from Internet of Things devices as well as digital biomarkers, integration in EHRs and laboratory information systems. And that, we do through the NAVIFY Integrator, which I shared as one of the foundations of our portfolio. As mentioned, this is still in the build, and, we are already commercializing one very exciting part of this, of this journey of...
The digital biomarker for Parkinson's disease will actually be shared in more detail by my colleague, Scott, later in this presentation. Lastly, I'd like to speak about the importance of our business model and how we believe we are able to increase the revenues from our digital portfolio. First, as we enter into the development of our digital solutions, we have built a process that very early assesses the willingness to pay of our solutions on the customer side. We work closely with customers to identify the value that these particular solutions add to their work and assess their willingness to share that value as well with us.
Secondly, we're already seeing that our digital solutions and products are offering competitive advantages that allow us to win with our customers in our respective pharmaceutical and diagnostics business, and we have also sold a number of our products in a standalone offering. Therefore, we're now increasingly making sure that we monetize separately wherever we add value through our digital products and solutions. We're also ensuring that our products are built in a modular way so that we can as well deliver the individual value propositions for our customers in smaller and easier-to-deploy products, instead of delivering very large and complex solutions that have a lot of additional benefits without being monetized....
As every good software company, we are selling these products as a service and on an annual subscription basis, which then creates a recurring revenue from our customer side. Outside of this, we also have components that we are providing free of charge to our customers, together with certain products that enable our portfolio and that ensure the stickiness of our digital solutions to our core offering.
I hope the presentation has given you a comprehensive overview of where we stand today with our insights, business, and portfolio offering, and would like to thank you, and with this, hand over to my colleague, Kent, who is also connected to what you have seen with how we integrate with our systems, speaking about how global operations then adds value through the use of AI and machine learning.
Thanks very much, Moritz. And a big welcome from my side to everyone. Again, I'm Kent Kost, Head of Operations, Global Operations for Roche Diagnostics. We've heard from Alan on the value for laying the foundational infrastructure for our digital platforms, and we've heard from Moritz on how to enable lab efficiency and clinical decision support. I'm gonna shift gears now and talk about briefly about what's going on in my world in operations. So let's go to the next slide. There's four use cases that I will cover, and it really spans the entire spectrum of plan, source, make, and deliver. And that's how we think about it, making sure that we're looking at this from a true end-to-end perspective. Let's go to the next slide.
Yeah, and here, unlocking the value of digitalization at each and every step of the process. So in planning, we're really striving to dramatically increase our forecast accuracy and do it far more efficiently. And we believe that doing this, it'll allow us to fundamentally lower our inventory and drive down our write-offs. On the sourcing side, it's a fascinating connection, getting real-time Roche connection back into our suppliers, and we do this for a couple of reasons. Number one, we want to be able to transmit to them what we think our latest demand signals are. And on the other side, in the use case I'm going to talk about, is how we fundamentally improve our approach to risk management and then address it preventatively.
The manufacturing side, this one, it's absolutely fascinating, and I could spend most of the morning talking about what we're doing in the digitalization of our manufacturing processes, but it's all driven by improved efficiency. In many cases, we combine this with our lean programs, so we digitalize it, and then we go in and we lean out the process. And one of the case studies, which I won't go into detail, but it shows the power of this, is we actually have one of our manufacturing lines where we doubled the output. We fundamentally doubled the output without changing a single piece of hardware. So it just shows the power of what we can do in operations.
Then finally, on the delivery side, it is all about making sure that we deliver to our customers on time, we do so in the most efficient way, and we've got the digital tools to allow us to do that. Let's go to the next slide. I'm gonna take a deeper dive now on use cases on each one of these, and I think, as Alan indicated at the onset, these are not theoretical approaches. This is real, tangible results that are driven across our value chain. So the first one is to—was in the planning area, to establish a machine learning engine for time series forecasts, and there's a couple of things that we've combined. So we've taken classical statistical modeling, and we've combined that with modern machine learning algorithms.
Of course, one of the biggest inputs to this is the real-time customer demand patterns. We started to roll this out to multiple countries where we get the demand signals, and two of which are depicted here, and I didn't cherry-pick these. These were just two very representative examples where you can see the adoption rate under the automated forecasting. So in the first example, the adoption rate from January to October of this year was roughly 7x, and we've improved the forecast accuracy by roughly 14%, which is certainly a solid step in the right direction.
On the next count, the adoption rate was a bit modest. It required a bit more training and some work on the interface, but we've increased the adoption rate of automated forecasting by a factor of 3, and the forecast accuracy has then went up by 17%. So, we're seeing absolutely steps in the right direction, and we see a significant benefit in both lowering inventory and reducing write-offs. And again, it's early days, but I expect that we've got roughly a 10% improvement in the near term in both lowering our inventory and as well as the similar amount in reducing our write-offs. So a really good step in the right direction. Let's go to the next slide. So sourcing and risk management.
Why is this important? I want to take you back about over the last 18-24 months, where many of you probably remember the tremendous challenges in the electronic component industry. And there was extreme shortages. Lead times went, in our world, for some of them, from 4 weeks to 12-18 months, and prices increased exponentially. So we had examples where price increases, lead times were going out, and price increases were up by a factor of 20. One of the things that we've done is we've introduced a risk monitoring for our suppliers. The traditional approach was paper-based and typically done on an annual basis. What we've got is a fully automated real-time solution today.
So we've got more than 1,800 suppliers that we're monitoring daily, and we're actually taking that a step further now, and we're looking at category data, looking at different commodities, and looking at the risk profile that changes rather frequently, surprisingly. With this automated monitoring, we've got a seven times higher probability to proactively prevent a supplier risk event. We put this into practice in the example that I talked about, the electronic component shortage, and we've successfully navigated that. Really good progress on the sourcing side. Let's go to the next slide, and this one is actually one of my favorite examples. It's a point-of-care sensor. This was a sensor that we acquired via acquisition several years ago.
You can see, I'm going to take you to the right-hand side, where you see in 2015, yield was roughly about 6%. Now, six percent is not a very viable product. It puts tremendous pressure on product supply. It puts pressure on quality, and it obviously puts enormous pressure on cost. So that was the starting place. Took a step back, and we said, "You know something? Let's digitize every step of the process." And so we did just that. And when we started collecting data, data that we knew was likely to be important, but equally important, we started collecting data that we had no clue whether it was going to lead to anything or not.
We had this idea that it was probably a multi-variable effect. We put this into practice, and you can see that we've got roughly eight years of data now, and we've driven yields from 6% to 80%. And early on in the process, we actually had some external experts come in and look at this particular manufacturing process, and they said, "You'll never get it above 35%." And we're routinely now in the 80%, which has driven our manufacturing costs down dramatically. 50% reduction in manufacturing costs. Customer complaints are non-existent, and obviously, customer satisfaction has improved substantially.
I think the important thing out of all of this is that this laid the foundation for improvements across our entire manufacturing network. So we've got a next-generation sensor coming along, and we've been able to take a lot of these processes and practices and embed them into our next-generation design, which give us an efficiency gain right out of the chute. Let's go to the next slide. All right, so on the delivery side, I'm going to talk a little bit about our delivery mode optimization. Now, why is this important? It's important for a number of reasons. Number one, we live in a competitive environment, and obviously, on-time delivery is essential in my world. Distribution costs are a significant cost driver, generally running it depends upon mix, but it'll run between 3%-5% of revenue.
It can be extremely volatile. So we obviously saw distribution costs vary dramatically over the course of the pandemic, but also it can be driven by natural disasters, geopolitical events, which will fundamentally change how you want to optimize your mix. Now, one other thing that I do want to mention is that air freight is significantly more expensive than sea freight. Roughly in our world, it's roughly seven times more expensive when you ship by air versus shipping by sea, and it also has a major impact on our CO2 emissions. Of course, there is a downside to sea freight, and that's the element of time.
So obviously, when you put something on the ocean, it takes a bit longer, and so you tie up cash a bit longer, and so finding that optimization is essential. So we built a system, and it's actually unique to Roche, where we've got it optimized to always select the right mode. It may sound rather simple, but there's more than a dozen variables that go into this. So it's things like weight, volume, contracted cost, cold chain, shelf life, customer requirements, et cetera. And so there's a number of different factors that go into this, and we've had some dramatic results. So we've increased our sea-to-air on an absolute basis by about 8%. It's led to an 18% year-over-year cost reduction and a 34% reduction in our CO2 emissions.
And so we're really bullish on further developing this capability. So let's go to the next slide. So, yeah, I talked about the manufacturing example, and I talked briefly about we are going to digitize everything. Some of it is what you know, and some of it is also what you don't know, but it's going to lead to more predictive analytics. It's going to streamline our processes, and obviously, then improve both speed of execution and cost efficiency. Thanks very much for the time today, and with that, I will turn it over to Aviv.
Hi. Okay, let me share my screen. And there we go. And so, I'm Aviv Regev. I'm the head of Genentech Research and Early Development, and I'm going to tell you today about our mission to transform drug R&D through something we call Lab in a Loop, which is the way in which we combine experiments with algorithms. And I'm going to start with a slightly personal note. As, as Bruno said, I came to gRED about three years ago, and I was motivated by a very particular fact, and this is that every step in making medicines is very hard. It's hard to infer the right target, and then it's hard to generate the right medicine, and to predict the right dose, and to go to the right patient.
As you know, the result of that being so hard is that not only does it take more than 10 years to develop a single medicine, but more than 90% of drug candidates fail in preclinical research or in clinical trials in this industry as a whole. As we are in Digital Day, I think it begs to compare that to the amazing performance of your favorite AI methods on many different kinds of problems, like vision, or text, or speech, that just a decade ago were deemed just as hard and had horrible failure rates. So you can't help but feel like there is something that can be done, and that feeling is actually what brought me here. Now, there's a reason that these problems are so difficult, and this reason is scale.
So there's thousands of different cell types and states in the human body. There's about 20,000 genes in our genome. There's 4 to the power of 10 to the 8th possible hypothetical variants for these genes, and maybe 10 to the 13th ways in which even the ones that we know about could hypothetically combine. Now, of course, the vast majority of these things never actually happen. Our problem is that we can't just naively predict from, say, looking at a human genome, which ones would occur, what happens if they do, and if it matters in disease. And then, when you start thinking about medicines, these numbers get even bigger. And so there's about 10 to the 60th possible drug-like small molecules, maybe 20 to the power of 32 different relevant therapeutic-like antibody sequences one might want to consider, and billions of people, and about 10,000 diseases.
These numbers are big. Some of them are actually bigger than the number of atoms in the universe. We know upfront we can't test all of them, in a lab or in a patient population, so it's not surprising that it's hard to find targets and to make drugs. But in the last decade, there have been multiple scientific breakthroughs that should really make a big impact and a dent in these numbers now. And so we believe that there's four such catalysts, we call them levers, that can fundamentally change the picture. The first one is human biology. That's our ability to study disease processes directly in patients, or their samples, or in human-derived models, so that human becomes our model organism. The second are what we call high-resolution and massive-scale lab methods.
They give us data across many targets at once, but at the depth that before we could only do for one target at a time, and they do this for only a marginally added cost. The third are the advances in therapeutic modalities that help us tackle unprecedented targets that were not really addressable before, or in ways that are much more efficient, or better, or safer for patients. And what ties this all together is when we pull our final lever of machine learning and AI. So these algorithms can take the data that we have from human biology at high resolution and massive scale, and across different therapeutic modalities, and they can finally span these massive numbers, and help us discover targets that we wouldn't find otherwise, make new and better molecules, predict outcomes better, and increase our capacity and speed at the same time.
That's our hypothesis. But each of these levers on its own is actually, well, not enough. What we need to do is find a way to put them together, and we do this in a unique way that we call a Lab in a Loop. So at a basic level, this loop is very simple. We start with an experiment, we collect data, ideally at high resolution and massive scale. We train a model on the data, and then we use this model in order to predict the next set of experiments, and we basically iterate this loop. This sounds very simple, and as the algorithms get better, it also, you almost can think of it, I don't know, as a self-driving lab. But you can only operate this if you work at the right scale.
And for this, we have to change not just our models and algorithms, but also the way we think about science and the essence of our work on algorithms and in the wet lab. And so in this talk today, I'm going to show you different ways in which we can use this loop across our R&D work in gRED. And so first, I want to take you to look at when we start with biology and a therapeutic hypothesis, and we wish to learn more about disease mechanisms or targets. And so biology is our entry point now, and the exit point will be something about our disease mechanisms. So one central set of questions that we always have to answer is what is my target molecule and cell, and where are they?
We need to do this in order to choose disease indications, in order to find combinations of targets for things like ma- bi or multi-specifics, in order to predict on target toxicities, and for many, many other problems. Well, the question is: how can data and algorithms help us in doing this? So for this, we're gonna take a page from how Google does search. So first, when you want to do search, you actually have to index the web. You take all the data out there in the internet, and you make them searchable. By analogy, we built a tool called sc Hub that has cell profiles for more than 200 million cells from over 2,000 studies, high resolution and massive scale, and that spans hundreds of diseases from all over the body. Okay, so now we have a massive world to search in.
We want to find a cell of interest. Now, unlike in regular search, where you can just type a name of something, we may not even have a name for the cells that we want to search. So what we want to mimic is actually something called reverse image search. So imagine you wanted to find a particular pharma CEO. You don't actually know their name or where they work, but you do have a picture of them. All you have to do is that you would upload their photo into reverse image search by Google, and it will find you their name and plenty of other photos, and website about them, and news stories, and so on. So we basically use the same style, the same architecture of algorithms now, except that we have a reverse cell search.
We input the profile of a kind of cell that we like, and we get their name and wherever they are in the human body, across all 200 million cells in our index. This is what we call a deep metric learning architecture, and once you train the model, just like with Google reverse image search, all you have to do is bring your new query cell, click a button, and in a fraction of a second, voila, here is your cell. Now, why does searching for cells actually matter for our portfolio? So a recent example is our vixarelimab program. It is an anti-OSMR antibody. We originally in-licensed vixarelimab based on our data that already suggested that signaling through OSMR in fibroblasts drives fibrosis in interstitial lung disease.
But now, what you want to know is where might there be other fibroblasts that could use this same OSMR signaling mechanism? Well, if you actually reverse cell search for the OSMR fibroblast, each of these blue circles is a tissue, and each of the inner circles is actually a particular condition where we can see these cells. So if you actually search for them, one of the places you definitely find them in is in the lung, and the conditions are different interstitial lung diseases, just like we expected. But in addition to this, you also find them in the gut, and specifically in IBD, in a different subset of fibroblasts that are called inflammatory fibroblasts. And on top of that, OSMR, when you look at human genetics data, is associated with IBD risk by genome-wide association studies.
And so from these data, we formulated a second therapeutic hypothesis that targeting OSMR using vixarelimab in IBD would block inflammatory pathways that drive IBD disease. I hope you can still see my screen, because it disappeared on my own screen. Can you still see my screen? Yes, hopefully. I can't hear anything. So, that drives it in IBD, and that blocking these pathways would bring a benefit for patients, and based on this, we actually launched a separate phase 2 study in this indication with no additional lab experiments, nothing in animal models, only data and algorithms. Okay. My next example is when we start the loop with massive scale, high resolution experiments, and our goal is to find better molecular pathways and mechanisms.
So there's now a very large toolbox of approaches that we have established in gRED, which have one underlying principle in common. We can do very large scale functional screens, and at the same time read very complex, high content readouts. It can be cellular profiles, like what I just showed you. It can be images at the individual and even the tissue level, and we can do these screens in cell cultures, we can do them in organoids, and we can do them in animal models. To illustrate what we can do with these, I'm gonna start with a simple example using one of these methods that we call Perturb-seq. It does a pooled CRISPR screen and uses single-cell RNA-seq as the readout to characterize the function of large numbers of genes.
So in this example, this is a large family of E3 ligases in the innate immune response in dendritic cells. Dendritic cells are important for both inflammatory disease and cancer immunology and are of great interest for our scientists looking for targets. This study had about 1 million cell profiles in one experiment. More than 1,000 genes were perturbed, and the cells naturally span multiple kinds of dendritic cells and macrophages. So we actually can determine the impact on multiple cells at once. And once we collect the data, we first use a machine learning algorithm to fit a model of the regulatory circuitry of how the cells are actually being run by these genes.
This model I'm showing you here in a simplified form, it connects the E3's adapters, substrates, and so on, into these co-functional modules of genes that, when we perturb them, have similar effects on the response of the cells. It organizes the responding genes in the cell into programs, and those programs capture different aspects of immune function, the response to LPS, the presentation of antigens, ER stress, and more. Now we have a full map of what all of these E3s are actually doing in the immune response in dendritic cells. For every step in the life cycle of dendritic cells, we now have E3s, adapters, and downstream transcription factors that control each aspect that our scientists can now aim to manipulate in order to generate the desired effect. That's the first step in order to get to targets.
That's already very useful for our scientists, but the algorithms actually go further. Next, they connect for our scientists these lab genetic experiments that were done in immune cells in a dish with human genetics that actually happens in patients. So this second algorithm tests which part of what I found in a dish by a machine learning that reconstructed for me what happens in cells, actually explains the heritability of inflammatory and autoimmune disease risks, so that I know that this will translate to patients. And then finally, the models do one better. We can learn a model that predicts what happens when we perturb multiple E3s in one cell. That is, we give the model as input to train it, cells that were perturbed in two genes or more.
We train a deep variational autoencoder this time, and it can then predict for us the outcome of other pairs of perturbations that we never measured in an experiment. And that's very important because you remember those big numbers? We will never be able to test all the combinations in the lab. Now, we've taken these kinds of approaches into our research portfolio programs. So this project is actually in collaboration with Recursion, Alan mentioned it earlier, where we focus on two disease areas. One is colon cancer and the other is neurodegenerative diseases. So here we use two kinds of rich and massively parallel phenotypes. One is Perturb-seq, just like I showed you for E3s, and the other is cell images. We also use two different kinds of perturbations. One is genetics, like I showed you for the E3s, the other one is small molecule perturbations.
So we're now screening for new small molecules, and in this way, we don't only identify targets, but we simultaneously also find small molecule hits that can enter our portfolio projects. So now I'm gonna move from biological discovery to molecule making once we have a target. The same idea applies. In each time, we have a loop. In each loop, we generate data in the lab, for example, on large molecules, on small molecules, on RNA vaccines, and so on. We use the data to train the model. We use the models to generate and predict the properties of new therapeutics, and we make and test them in the lab again. That yields more data, and that allows us to iterate the loop, both to reach our goal for a particular program and to make a better algorithm that can be used across all programs.
I'm gonna give you a couple of examples. The first one comes from small molecules, and our method here is called GNE-Prop. This is an encoder classifier model, and we first train it with the results of a high-throughput small molecule screen of something. For example, of 1 million molecules assessed for antibiotic resistance, sorry, for antibiotic activity. This is the example that I'm going to show. Then we use the trained models as an oracle, that I'm gonna show a virtual molecule, and I'm gonna ask my oracle to predict whether it's gonna be active or not.
In this way, we can screen billions of virtual molecules, predict the activity of each of them with my trained oracle, and the oracle will propose to me new small molecules that should be active, and then we're going to synthesize those in the lab, test them, and iterate. So in this real-life example, I'm gonna take you through one cycle of this loop. We train the model on 1.2 million small molecule, high-throughput screening, phenotypic screen for antibiotics. The screen was done in 2017, well before the algorithm existed. We use this model as an oracle in a virtual screen of 1.36 billion molecules for their activity. The algorithm predicted 345 compounds as active.
We made those, and we tested them in the lab, and 82 of them, or 24%, were active. This is about 50-fold better than the hit rate compared to the original approach when an expert medicinal group of medicinal chemists, our best and brightest, made their choices. And even more exciting than that, more than a third of these 82 molecules were actually with new scaffolds that were not in the training, that were not in our screening library. And the algorithm also predicted correctly when major changes in activity occurred, even when it was actually only a very small change in the molecule. This is something known as activity cliffs, which is very difficult, both for human medicinal chemists and for algorithms, to actually predict correctly. We use a similar Lab in a Loop for antibodies. We start with antibody sequences.
We use them to train the model. We use the model to design a new antibody sequence. Then we make those antibodies in the lab, we test their properties, and we iterate. Now, we already heard again briefly from Alan, just over two years ago, we acquired what was then a teeny-tiny proto company called Prescient Design. It basically consisted of the three founders at the time, and it was pre-seed stage. And then we grew them ourselves, and they became our machine learning for drug discovery accelerator, and together with our antibody engineers, they built the lab in a loop for antibody machine learning and drug discovery.
So we have gone through many cycles of the loop already with Prescient, and the antibody optimization algorithms have gotten better and better through these multiple cycles of our loop and through development of multiple algorithms to tackle different goals that you have when you make an antibody. So when Prescient started, we actually assigned them four practice targets, but by now, this has become part of our portfolio projects while continuing to improve the algorithms. Doing these iterations is only possible when you have both ends of the effort in your hands, computational and experimental, and they can work fully together and fully transparently on all data in all ways. This is something we cannot do simply with an external partner in the same way, and that's actually why we chose to make an acquisition and invest in this heavily internally.
Now, Prescient developed a whole zoo of models and methods. Some of them focus on optimizing an antibody that we found experimentally, and others are generative AI that makes molecules de novo. Now, Bruno showed you that it's not so easy to just ask DALL-E to make you a protein, but you can use generative AI to make proteins. So one method that I can describe today is called EquiFold Diffuser, and it allows us to do fast prediction of antibody structures. So just like with GenPro, which we use for the small molecules, we like to use algorithms as oracles in virtual screens for antibodies, too. But here, you have to generate a very large number of antibody structures, not just sequences, and that is a lot more computationally heavy than generating a lot of small molecules virtually. Now, even in silico, this takes time.
I'm sure you've all heard and probably use AlphaFold or AlphaFold 2, where machine learning is used to predict structure from sequence. But antibodies are unusual. They have these regions that are super variable, so you can't compare them to anything else, and running just one structure for an antibody can easily take you about an hour. Now, if you just want one antibody, that's not a problem, but if you want to do a huge virtual screen, that's actually not feasible. And so one advance that the team made was to combine new coarse-grain models, something called geometric deep learning, together with diffusion models in order to generate protein structure predictions, and I ran one live for you right now, and design models for antibodies. And the accuracy of these models is state-of-the-art, but they're a thousandfold faster.
Now, diffusion models are the architecture behind things like DALL-E, so I did want to tell Bruno that Gen AI can make you awesome new proteins, and not only do they look nice on the screen, they actually express when you make them in the lab, and they can even bind their desired targets. Okay, so next up, I want to turn to an example in our patients, and for this, I'm going to switch to autogene cevumeran, which is our personalized neoantigen cancer vaccine, which we're developing in collaboration with BioNTech. I'll invite you to think of this example a little bit like a clinic in the loop. With a cancer vaccine, we aim to target the immune system to recognize neoantigens that are unique to each patient's tumor.
A patient's tumor is sequenced, and then we use an algorithm to select Neoantigens from the sequence, and then a personalized vaccine is synthesized and is given to each patient. For the vaccine, it's absolutely crucial that we choose the best Neoantigens, which means they would be presented on MHC class I, and they would elicit a good T-cell response. And for this, we incorporated, amongst many things, a Transformer-based model. Transformers are the family of models that are now very popular as LLMs, or large language models. Now, these kinds of models are showing superior prediction performance and generalize across dozens of MHC alleles when they're trained well. And what is also cool, kind of as an aside, is that once we have a model architecture like this in place, we can sometimes reuse or expand it, with some modifications, to other related applications that we need.
So one example of this is moving from prediction of MHC class one presentation for the cancer vaccine to MHC class II presentation, which is unrelated, but it's a harder computational problem. If we can predict MHC class II presentation, that actually helps us tackle antigen, sorry, antibody immunogenicity so that we can predict it, and then we can engineer it out of our antibodies before we ever head to patients. So I showed you many examples of our Lab in a Loop, and I want to turn to this loop one last time in thinking about the next level at which we try to tie this loop together in how our algorithms enable our creative scientists. We want to be able to give the scientists the strongest and most transformative tools in their hands.
I showed you how machine learning in general and generative AI, in particular, are at the center of this loop to discover targets and generate better molecules, but there is more that we can still pull in. On the data side, we have these massive volumes of written knowledge and old slides and lab notebooks, electronic lab notebooks, fortunately, and non-text-based experimental results and images and gels, and you name it, it's there. On the algorithm side, there's now fantastic tools to build foundational models, not just of antibodies and sequences and cells, like I showed you already, but also of our human knowledge, like multimodal LLMs. From such data, they should enable our scientists to reach their maximal creativity.
So in the beginning of this year, as Alan pointed out, we leveraged our existing investment in talent in this area to launch an effort for all of Roche to train our own large language models on a combination of public and proprietary Roche data. And this means text data and multimodal data from the notebooks, the large data sets, imaging, and more. And I'm going to close in the next 2 minutes by telling you a little bit about it. ... So in Roche, our LLM approach relies on three pillars. The first is Galileo, which Alan already described. It's in the use pillar, where we're working to deploy existing public large language models and any future private LLMs into our work, and fine-tuning the existing LLMs for tasks that are important for us.
The second is that we're architecting and training new LLMs, where we use internal data in the training of text-only and multimodal models. And finally, in the third, we're developing new algorithms to solve key problems by incorporating an LLM component in other algorithmic setups, like prompting models with data or using autonomous agents. So in terms of using, just from an R&D perspective, you've got a broader one from Alan. LLMs are already rapidly impacting everything that we do. So this includes things like writing or optimizing code, annotating cell types, summarizing and searching the scientific literature, creating talking points, suggesting analysis methods, writing lab protocols, even checking clinical symptoms. You can see some of these snippets from a Slack channel, a real Slack channel inside Roche.
That's already great, but we think our true advantage will come from our training of new LLMs that incorporate Roche and Genentech's proprietary data. This includes our documents and our lab data. Our Prescient team are experts in LLM and other generative AI approaches because these architectures are actually the basis of all of these great molecular designs that I showed you, and so that allowed us to hit the ground running. And we've already trained and tested both a 0.5 billion parameter and a 7 billion parameter model, and we have a 30 billion parameter model that's under testing now. These models are trained from scratch, which allows us to emphasize our own data and biomedical corpora in general, because those are more important for our applications.
We're in second round alpha testing across all parts of Roche, and we're observing already improved performance over commercial and open source models for the kinds of specialized tasks that are important for us. We also believe that the most exciting component will be including our model to address further scientific questions that help drive the lab in the loop. For this, we plan to prompt the model not only with human language, but also with experimental data, and with autonomous agents.
Then the model can answer with human language to a human user, but it can also provide input to the autonomous agent, so that the autonomous agent can continue doing work and help the scientists as they progress, as their copilot, I guess, for the lab, to empower our scientists in their search for targets and for molecules. So today, I talked to you about our strategy in AI for research and early development. Through my examples, I highlighted our three pillars of differentiation. First and foremost, our... Oops, sorry. My slide advanced too quickly. First and foremost, our lab-in-a-loop. AI does not stand alone, and our power is in the ability to iterate with experiments again and again across all aspects of R&D, up to the self-driving lab one day.
Second, in data, you need both scale and resolution, and we have the data generation capabilities to work like this. In the world of AI, quantity becomes quality, and it pays off to be big. So we're maximizing the benefits of our large size, our proprietary legacy data, and our ongoing data generation capacity. And then finally, we make sure to have the right partnerships and the right opportunities. In some cases, it's about acquiring and then investing and growing, like we did in, with Prescient Design. This ensures that we have the top capabilities in-house to run our loop with full transparency. But we also seek partnership with the best out there for unique data generation capacity, for unique hardware, for other specialized expertise.
The latest example of that is the partnership that we announced recently with NVIDIA to actually bring together the power that we have in our lab in the loop, together with NVIDIA's both infrastructure, resources, and technical expertise in order to solve these very challenging questions for patients. With this, I will conclude, and I'm gonna turn it over back to my colleague, Scott, in person.
Great. Hi, everyone. Pleasure to be here, and glad you could make the call. So today, I'll talk about how in pRED, we're transforming drug discovery through data and analytics. And as a first step, I'll talk about our overall approach. We do it in two ways: one, on the left-hand side, how do we use AI, ML, and automation to run that R&D engine and that Lab in a Loop, as Aviv highlighted, as fast as possible? How do we learn from those insights and the data that we have to make the next best step in molecule design? And then the second, on the right-hand side, is more about how do we change that loop? How do we change how we do drug discovery? What are those smart bets that we can make? And here, I'm gonna highlight a few examples in both categories.
So the first, first off, I'm gonna cover—Aviv covered our discovery area so well. I'm just gonna throw in one example there of what we're doing in pRED, and then I'm gonna give three examples more in our early development space. The first being how we run that, lab optimization loop, and here we've got an example with our MLOps environment. We've got, starting at the top, about 150 models running in our MLOps environment that leverage all the scientific content we have available to us, for small molecule design, in both internal IP and also publicly available. And we use those models to predict what are the best molecules and properties that we want to optimize in our Design Hub platform.
So this is an environment where project scientists can work together to really just brainstorm and give each other feedback. These are the best molecules we should synthesize next. But once they've decided which molecules I want to move forward into the synthesis phase, you have to actually come up with a synthetic pathway of actually how do you want to make that molecule? We leverage Chemical.AI and some internal tools to leverage all the publicly known chemical reactions and the millions of proprietary chemical reactions we have available to us within Roche to really design that best synthetic pathway, both from an efficiency time perspective, but also from an overall yield. Once we've made the molecule, then, of course, you have to test it.
It's usually not just one test, such as a potency assay, but it's selectivity assays, ADME properties, and others. We track that and enable it through a whole workflow cascade. You can almost view it like a decision tree. If one assay gives you the result you want, you might kick off the next set of cascades of assays. Here, it automates that whole process in terms of the planning, ordering the reagents in place, but also helps us track where are we on that, assay progress. Once we've gotten the results, then we view them, of course, on our D360 analytics environment. But that's not all running in that lab in the loop.
Majority of the models in MLOps takes the data that we're generating every day, rebuilds themselves, so that it can give the next best set of predictions for the next optimization round. Really, so we leverage ML and AI not only for the design and the prediction, but also automating that process as fast as we can. The next example I'll give is in real-world data. So I'm proud to say that Roche is by far the leader in how we leverage real-world evidence more than any other pharma in the world, and we do that in strong partnership with our Flatiron sister company.
So in this particular example, we had an oncology trial where we designed it in a way that we were trying to find the right patient for, the right molecule that we had designed at the right time, and we found that we had a slow recruitment rate. But using our real-world data, we were able to recognize that if we were open up the scope a little bit in terms of enabling certain pretreatments, we are able to have no significant change to the patient, progressions, progression-free survival rates. So we could actually increase our patient population by 15%, and it doubled the, onboarding rate for patients in our clinical studies.
So that helped us overall, both in an efficiency perspective, but improved our patient inclusion in the trials to make sure we get our, our molecules to the, to the right and the best patients. The next example I'll give is in the digital biomarker space, something Moritz already highlighted earlier. So we use digital biomarkers as a means of increasing the sensitivity of how we measure patient progression, and also how they're recognizing how they respond, and benefit from the molecules that we're designing. I'll speak to both our Parkinson's digital biomarker and, Huntington's, both leveraging a, a digital motor score. In the case of Huntington's, we have over 1,000 patients that have participated in our clinical trials, 2.5 million hours of passive monitoring data. That's 300 patient years of data that we have.
It's an order of magnitude above any other pharma from a Huntington's dataset package. And when we look at the results of how we can use that information, it increases the sensitivity of us recognizing how a patient is responding to a particular treatment. So in the case of prasinezumab, in our Pasadena study, looking at that digital motor score, we can get the same clinical readouts with 40%-70% fewer patients. And in the Huntington's, it's even more significant.
So if you look at week 20, in the blue column, using our digital motor score, you'll see that we get the same level of sensitivity at 20 weeks, compared to 68 weeks when we use a more traditional biomarker, really showing the impact that it can have, not only from a time perspective, but also from an overall patient population perspective. In this case, 75% fewer patients needed to get that same clinical readout. So it's really impressive work that we're able to leverage using digital biomarkers. And then the last example I'll give is in the ophthalmology space. So in diabetic macular edema, those patients have what are called HRFs, or these small hyperreflective foci in their retina. And it's been known for over a decade that they exist.
What we have is, in our clinical studies, we have over 50,000 retina scans. We wanted to create all of the clinical data ingestion pipelines and workflows so that we can recognize for all of these images, what patient was it from, from what trial visit, which eye is it from, which slice of the retina scan is it recognized? We created all of that automation in place so that we can process all of those images and do deep learning to recognize how our patients are responding to our particular treatments. Here I'll show you a video of a rendering of one of those retina scans. Here we have a DME patient starting off with the blue being fluid in their retina and the red dots being those HRF specks.
You'll see after just one treatment, week 16, that it has a dramatic reduction in those, both fluid and the HRFs. We really think this is helps us understand the biology, what is actually happening in the patients, and we think it's related to also the design of Vabysmo with the dual action inhibition that Vabysmo is targeting. So, in a post-hoc analysis comparing faricimab and aflibercept or Eylea, we're able to show that the reduction in HRFs through this deep learning process is actually more significant than Eylea, and we get a better overall readout. We think this is related to that dual therapeutic potential that we see with Vabysmo. But it doesn't stop here.
We're leveraging this information both for our understanding of inflammation as a whole. We can apply it in other disease areas. Because we've gotten such significant benefit where these clinical images get back in the hands of our clinicians within two weeks of us receiving them, we're applying these same data ingestion processes for all of our clinical studies in other disease areas. How can we recognize other biomarkers and get that data in the hands of our scientists to analyze and learn truly in that Lab in a Loop, understanding the biology and designing molecules in the future? So with that, I will hand it over to Bruno, who will walk us through the Q&A.
Thanks a lot, Scott, and also thanks to all the other speakers for these very insightful presentations. We will take the first question from the phone, and it comes from Charlie Mabbutt from Morgan Stanley. Charlie, please.
Hi, Bruno. Yeah, Charlie Mabbutt from Morgan Stanley. Thanks for taking my questions. So I guess firstly, with technologies in AI drug discovery, what percentage of R&D costs do you think could ultimately be saved, Alan? And do you think long-term that could lead to declines in R&D expenditure across the industry? And secondly, how are the regulatory authorities reacting to studies such as you had suggested in IBD, where the only evidence is in an AI model, and also use of digital biomarkers, which you talked about in the last presentation in Parkinson's, et cetera? Thanks very much.
Can I start with R&D cost?
Yes.
Charlie, I think I wish I could come up with a great prediction here, and honestly, I don't have it on hand. I think that's really it's early days. I think with Aviv, who's very clearly the leading computational biologist on this planet, I think it's great that we have really this, how should I say, this knowledge in-house, and that we can work on it. And I think you've seen how she has demonstrated it. For us, I think the first thing is outcome. Yeah, bring outcome better, be faster, yeah, have better outcome. And I'm sure with that, certainly, I think, as you know, the major cost is the cost of failure. I think if we can bring that down over time, I think that would be a major, major achievement.
Aviv said with 90% that we're doing in pharma, we're failing. I think if we bring that to 88% or 87%, I think that would be a, a fantastic outcome here. But as I said, I think it's really early days. I don't have a good prediction yet.
The second question was about IBD and how AI models will get accepted by the regulators.
I think that was for me on vixarelimab.
Yeah.
So I think some things are important to mention there. This molecule was already added to phase II. It was safe, it was preclinically tested, it was in patients before, and it was efficacious. So not for IBD, but for other conditions. So some things that if this had been, if you rewound 7, 8 years earlier, these tools didn't exist, of course, at the time. But if they had, and that is how the predictions would have been made, you would still probably today need to do preclinical work appropriately, also in the context of safety and other things. There's a lot of work also on predictive safety and predictive analytics, but we didn't talk about them today. But that's just to be clear for this particular example. The second really important point to make is what the regulator sees, not just an AI prediction.
What they see is this is done on human data. This is actually stronger data than when you have an animal model because it is based on data actually from patients with IBD, and what their cells are like and what their human genetics is like. And so the algorithm makes a prediction, but it makes it based on very high quality, high resolution human biology. And I actually think it's hard to compete with that, with lab information. Lab information is often more distant from the disease than the kind of data that the algorithms actually help us wade through and find this, and find this particular hypothesis. The trial is on clinicaltrials.gov, so clearly this is an appropriate, appropriately done work.
And, maybe, if I can also add here a recent example in our phase III EMBARK study in DMD, for example, in the secondary endpoints, we had the SV95C endpoint, which is the first EMA-accepted digital endpoint. And it was interesting to see the results because, when it came to the North Star, the primary endpoint readout, we would see, significant differences between individual countries. But interestingly, when you look at this digital endpoint, the results were much more, in line and, and similar. So I think this really tells you something about, the power of these, tools, and I think the, the, the significance will, will only increase. Okay. Did we, Charlie, did we answer all your questions?
I think there was one on the digital biomarker.
Yeah, I think it was related, but, yeah, digital biomarkers, maybe, Scott, you want to add something about-
Yeah, I can just add very briefly that, yes, we are using it as confirmatory measures at the moment, but we're also in active discussions with regulators. Can we use them as our primary endpoint? And if you think about movement-based disorders, how better to measure the impact on patients than measuring the movement overall? So there, I think we have some promising progress.
Very good. Great. Thanks, Bruno.
Then, let's take a second question from the phone. This would be Peter Welford. Jefferies. Peter?
Hi, thanks for hearing me now. Hi, so, two questions. One is, do both actually sort of in a similar vein. When do you think we actually start to see the possible benefits of this in terms of if you were, if you were to say, that now, obviously, this has started in the process of improving R&D in terms of the initial stages, I mean, how many, I guess, how many years do you think we need to see before we can actually you can confidently say that these processes are actually improving success rates? And equally, I don't know, I guess one for Aviv, but how far are we along in this journey in terms of do you feel at the moment, is this really just the start?
Because we hear a lot about AI. Do you know, do you feel now that you've sort of got the tools that are needed, and we are now sort of beginning to actually reap the rewards, if you like? And then the second question is just on, I guess, data sharing, if you want to call it that. I mean, presumably, most of the major pharma companies are all building up these libraries of billions of data points across billions of cells and all this whatever the numbers were. And I guess my question is, is where does the value lie, I guess? I mean, are any of these data points shared across the industry? Is every company building similar data sets, and in the end, it's the analysis tools that differentiate you? O r I guess, how do you think one pharma company's data set and ability differentiated, perhaps, versus another, if that makes sense?
I don't know if Alan wants to start with the first part, and I'll do the other two.
Yeah, I think it's anyway related. Aviv, let me make a quick comment here. I think, are we starting to see the benefits? I think we have to distinguish between really the use case. I would argue everything which is related to productivity, what Kent has talked about, what I have talked about, that is imminent, it is there. I think we see that every day, and it's increasing, and it makes a huge difference moving forward. I leave that to you, Aviv, to go to the R&D side, but he might feel it, it will take a little bit, but perhaps you can be more precise about that.
Yeah. I think it does take time in R&D, and let me try and explain it in a little bit more of a timeline way. So first of all, I'm actually gonna go to other fields. I'm— You know, it's always hard to predict the future, but there's lessons from the past. Usually, with the technological advances, I believe that you tend to be on a, on a 10-year timeline, roughly, in the science advances. And this is because it takes about the first it's the first 3 years is the proof of concept periods.
The first 5 is really the build, but the second 5 is really the reaping of the benefits, and by the end of the second 5, people are like: "Yeah," like, "it's been here all the time," and it's like: "We can't imagine working in any other way." I've seen that happen multiple times in my own life, so that's where I draw the conclusion from. I think the distinction, of course, in our unique field of drug R&D, is that you have a very lagging indicator of impact in the patients, and in some diseases, you really have to wait a while for that indicator.
By the way, some of what Scott showed, the digital biomarkers, you sometimes can know sooner as a result of them, where you're headed, but sometimes we still actually have to wait a fair amount of time for the ultimate measure of success, although you have kind of indicators along the way. In terms of where we are specifically in this journey, I think we're now in the... You remember in the first three years, and then the five, and so on. We're kind of at the close of those first three years. We're starting to have a more and more robust way of not just having brought the people and built the tools, and it's not just building the tools, it's integrating them together into this loop. We are now seeing it deployed in our work. That's the examples I actually showed you, right?
These were actual projects of an actual portfolio at different, at different phases of them. That will have to increase while we're still improving the tools. The tools are not all there for all problems, and their engagement and deployment means changing everything that you do. It is a gradual process of this full stack change. I would say maybe a third, a quarter of the way through to the promised land, but prophecy was given to the fools, so you never know exactly. I think so. It's a hard road. It's not an easy one. Biology problems, in particular, are harder than some of the other problems that people have worked on. Causality is harder. We need causality for targets, so that's probably the hardest problem of them all.
And in terms of productivity impact of that, I would say that if you go after the right targets, then everything improves in your productivity, so that's kind of the most ultimate thing. But where you see the earlier impact, it will be in the molecules. And actually, digital biomarkers, design of trials, and so on, that's impacted even earlier than that. So it's almost like walking backwards on the R&D process in terms of when you see each of the impacts, but ultimately you have to go to the root cause. Your second question was on data sharing and where the competitive advantage lies. So I'll start with the data piece. The trick is actually to both use your data and that of the rest of the world.
The example I showed with vixarelimab, those 200 million cell profiles, is a combination of data that is genetic and Roche proprietary, and data that is actually from the public world. We use everything that's out there in the world that's available to us. We don't limit ourselves to our own data. Genentech is also committed to publications, and we do publish, but of course, with certain constraints. So that is... There are areas where we are in consortia with other pharmaceutical companies and academic institutions in human genetics. That has brought huge benefit, I think, to all companies and to society, that these consortia exist, and we are part of them, often a founding member. And there's other areas like that as well. So we, I think, use a very balanced approach in terms of the data itself.
Where I think the competitive advantage lies is twofold. The first is having the right mindset, the right questions and the right models and algorithms to address them, so that you can frame something for the algorithm that is actually meaningful to you. In many, many cases, people just use methods that are out there, but they're not actually the right methods for the problems of drug discovery, target discovery, drug discovery, and drug development. They were developed for other problems, and as a result, they don't give you exactly the right answers. So modifying things so that they suit our problems actually requires a lot of innovation. A great example of that is protein structure in general, and antibody protein structure in particular. We make therapeutic antibodies. Those proteins matter to us, and they have very unique characteristics, and they need their own algorithms. You can't just use general purpose tools and get the answers.
So that's one aspect of it, and the second, and I think the most material aspect, is this iteration. It's never just the algorithms alone. AI on its own is not enough. AI needs data and ideally iterations, and that's what we're building, and that's unique. I think I answered all three questions.
Yes, I think so. Peter, did we answer all your questions?
Yes, sorry, I was on mute. Very good. Thank you.
Yeah, and maybe, here, another comment from my side, because I know you're interested in the pipeline. I think, you've seen the example with Scott on prasinezumab, for example, in Parkinson's disease. I think this is one of the molecules where we, kept the development program, going, and we will have phase two results, to come next year. And, we would probably have stopped this molecule just based on the traditional, endpoints, but we saw this, consistent strong signal in, in some of the biomarkers. So let's see how this plays out next year, but it already has an impact on one or the other molecule in our current portfolio.
With that, I would now maybe read a couple of questions which came in here, and the first one would go to Kent. This is about COVID-19 caused problems for many forecasting models. How did you address the demand shock when building your model, and is the new model capable of recognizing future demand shocks?
Yeah, that's. Thanks, Bruno. That's it's a great question. What I'd say is demand during COVID was nearly impossible to predict, both going up as well as going down. We went through multiple waves, and I think we weren't alone in that. I think the entire healthcare industry was hit by that. So our model is primarily built around detecting and responding to this plus minus 25%, and so we put that into routine. Let me talk about briefly, because I think the question gets at what would we do during the next pandemic? And again, I think a couple of things.
Number one, we have a good handle on what our installed base capacity is, so how many tests you could run theoretically if you were running 24/7. We've got a good idea on our manufacturing capacity along instruments, reagents, and consumables, so everything it takes to run a test, and we know exactly how long it takes to ramp up these manufacturing lines. So what we do in an extreme case is we monitor the external environment, and we're looking for triggers that could signal the next pandemic, so that we are, in fact, ready to respond. And I think monkeypox was a good example where it came on our alert early.
We were able to to address the capacity, and then we monitored the uptake and consequentially put ourselves in a good position as this developed further. So again, I think you have to separate from the routine versus something that would be extraordinary, but I believe we're well prepared now for both, and the lessons from COVID were certainly invaluable.
Thanks a lot, Kent. We'll take another question, and this is about people and talent. The question is: "What professional profiles will be necessary to deploy these health strategies based on information and management? And of these different talents you need, which are the ones hardest to find?" I think, yeah, whoever wants to-
Yeah. I'll go ahead and then pass over to you, Aviv, if that's okay. I think one important profile is really that combination of clinical information and medical information. so both the marriage of the two, understanding a disease or understanding a patient pathway, understanding the needs in the care system, and matching that with the possibilities that data give us. That's a key element. We see that a lot with this profile of chief medical information officers that are coming up, but that's really an important marriage and a key element that every organization will need to be successful in digital healthcare to combine.
I think another important part, and that's certainly an ever-increasing challenge, is the availability of talent in the area of cybersecurity and privacy, which is particular for healthcare and health data, so critical. And that's probably would be my tip for every young talent today, to actually if you want job security in the coming years, that's a good area to find, to find jobs. Aviv, I'm sure you have some, some further profiles.
Maybe, Aviv, one question to put on top for you, because I think you're the right person to take it. You know, excluding the U.K. and Ireland, what would be the best universities to be trained as computational biologist? Maybe since you have close ties to academia, you might have some thoughts on this one as well.
You want me to answer that, not the talent question?
No, both.
Okay, so I'll finish with the talent, then I'll come back to this. So, I would organize it on a spectrum from what I would call the most computationally technical to where it's more in our domain. Can be biology, can be chemistry, and I'm definitely restricting it to the R&D side. So on the most technical side, you have two major phenotypes of people that you need in order to build these capabilities, and these are your computer scientists, ideally in machine learning expertise and related areas. It's not just machine learning. There's multiple advanced computational approaches that are important, but I'm just putting it under this title. I'll call it algorithms. And the second is engineers. Both machine learning engineers and software engineers more generally that can really take algorithms and make them into robust, very high-performing code in an enterprise environment. There's UX that goes with that.
There's a whole world, right? That's a particular kind of core technical expertise. The second category you have are what I would call analysts. In many places, people will call them data scientists. These are people who understand the use of computational tools. They don't necessarily invent new algorithms. They typically actually don't, but they are the best today at putting them to use together with a domain-specific question. They can be in the biological domain, and then they're computational biologists. They can be in the chemistry domain, and maybe they're called chemical computational chemists or cheminformaticists, and so on. That is a very crucial interface that today is really inhabited by people who are bilingual, and some of them come first from the domain and into the other language.
Some have made the other career path, but that's where their strength lies. And the third category of people that I think is often neglected in these kinds of conversation is actually changing, at the core, the way that all our scientists do their work. So when you do your lab biologist or your classically trained biologist, I would call it, or a chemist and so on, you actually need to think about biology and chemistry differently when you realize this is in the world, and that is a major part of the shift that is happening. You do your experiments differently, you design them differently, and you don't design them alone. It's always kind of a joint activity. And what we're seeing is, and that I've seen even before coming to Roche, there's a generational shift.
Scientists trained today are much more native in thinking across this spectrum than scientists trained even 10 years ago. And as a result, when they come, they actually don't always need an analyst. They can analyze their data on their own. They can think of new problems in a way that's formulated well for an algorithm because they're used to working in this way, and that's a gradual shift in the talent landscape. I would say finding great people is always hard. Always hard, will always be hard. In a field, when there's opportunity, there's demand, it's kind of an obvious thing. What we need to do is provide environments for people where they can do things that they can't do anywhere else. That's what attracts people, that and a mission.
If you give them that, they do the work because there's no other place where they can achieve it. To the question on training, there's so many good places to train. I think that's often a mistake, to think there's just a select few is actually not true. And the world is very big. Somebody said, "Excluding the U.K. and Ireland, that leaves me basically the rest of the world," and there's so many good places to do this kind of work. If your focus is in the interface between, say, the computational sciences and one of the other scientific areas, I would look for a place that has enough critical mass in both.
If we're talking about grad school, I think what's primarily important is finding places where there's a critical mass of strong research labs, and that you can find in multiple institutes on the East Coast, the West Coast, the center, and the South of the United States, in Canada, in France, in Germany... I can't. I mean, it's non-enumerable at that level. I will also say, if you're at the point that you're kind of making a career choice, you can always send an email, including to me, and I would always reply to whomever asked the question.
Very good. Let me pick one more question here. This comes from Sean Hammer, and the question is: "What percentage of drugs currently in Roche's clinical pipeline were discovered through AI machine learning algorithms?" I think we have seen a couple of example where AI is contributing already, prasinezumab, Vabysmo, for example. We have the gene therapy, which we mentioned, in DMD. But the question is really, I would say, where is the fundamental contribution that this drug is entering the pipeline? And maybe, Aviv, this probably first give it to you here. When is the first molecules entering a phase I which are really based on AI approaches?
I actually showed you. It's not a classical molecule, but the RNA vaccine is based on machine learning.
It's an algorithm that chooses which new antigens to put in the vaccine. I think that, maybe I wasn't very clear about that. That's not chosen. That requires an algorithm. You actually can't design it without an algorithm, because you have to predict which peptides will be presented on an MHC class I. It's actually a hard prediction problem. So, in clear disclosure, the specific graphs I showed were not from patients' data. They're from a paper. It's a slightly different method, but the general statement is very accurate and correct. For other types of molecules, antibodies, and small molecules, I cannot make a specific statement on a specific program. But what I will say is that we have programs in advanced stages in research that have this next generation AI component to them.
I will also say that for small molecule drug discovery, for at least 20 years, people have been using machine learning approaches, if not longer, but they weren't these next generation methods. They were kind of the previous generation of machine learning. So in full clarity, I thought that was important to state. Yeah.
You had the example of the new lead structures in the antibiotics. This is something which-
New lead structures in antibiotics. We have things in targets all over the portfolio. I just can't comment on specific ones beyond the examples that you saw.
So we should expect, I think, in the next few years to first molecules really to show up here in the pipeline, I assume. Very good. Scott, anything to add from the gRED perspective?
I think Aviv covered it extremely well. We use ML AI to augment the design of all of our molecules, all, modalities. I think it's really essential. It helps us remove properties from those molecules that we don't want, and it, I don't know that we could do it without those, algorithms and rule-based tools. So I think it's going to continue to evolve, but it's already there.
Thanks. So there's one more question from the phone here from Harry Gillis. Harry, please.
Hi. Thank you. Thank you so much for the great presentation and for taking the question. So I just wanted to ask how you think about measuring returns on your digital investments or your AI investments. So for example, I can imagine this is a lot easier for some of the, maybe the supply chain processes versus the earlier stage R&D work. And then, I guess thinking about the R&D investments, will it really just take time until we see some of these drugs discovered by AI, until they ultimately reach the clinic to be able to assess sort of the returns on these investments? How that ultimately plays out.
Then related, just given sort of the breadth of exciting opportunities and capabilities you've highlighted, I was just wondering how you make sure that you continue to spend on the right programs and capabilities. And I guess if the scientists are going to the CFO with all these amazing projects, sort of what processes are in place to maybe hold that back and ensure that the money is spent wisely? Thank you.
Thanks. I think this-
Huge question. That's a huge question.
Yeah.
I think we could spend hours on this one to answer it. So let me give it a try. I think very soon—everything where it makes sense to measure, when it comes to digital, I think certainly, I think we come up with a return. I think when we do an ERP program, yeah, like Aspire, I think very clearly we have a business case behind it, and I had it even on my slide. We have a positive NPV, and it's even a massive positive NPV behind it because we go through a business transformation. So it's pretty clear.
I can well imagine, and Ken surely will make a comment on, and he even said it in his presentation. I think he has very clear applications, yeah, where he can measure the outcome and where the returns are clear. It's very hard, as Aviv said, when it really comes to the R&D points, yeah, how we do that, and I think certainly also what Scott has said. I think here we go in and say: "Fine, that's part of the R&D budget," yeah, "and you guys hopefully spend it wisely." And you see also what we do really with when it comes to the large language models with Galileo; we even leverage that in the company. So I think it's not like this is this really Genentech exclusive or gRED exclusive.
I think we leverage into other areas like a tool, yeah, that we use. I think it's really like, okay, if you wanna calculate a return on Excel, if you like, in our company. But very clearly, I think wherever it makes sense, yeah, to come up with a business case, certainly we do that. But there is a bulk of investment, yeah, that we do on the digital side, which basically has no explicit return, and I would even argue it would be very hard to calculate it. And if it's very hard and the predictions are anyway wrong, honestly, then we stay away from it and leave it where it is and put it in a budget.
Yeah. I will give... Aviv covered it really well. It's easier in the clinical space, where you've got huge swaths of data than in the, the discovery space, where, sometimes those, data points help move the needle in, in the right direction, but are, are, inflections. So in one example I didn't highlight is, we use a platform called Edison for all clinical data ingestion, both on the pRED side, but also in PD, and we save about CHF 400,000 per clinical study, helping us put all of the metadata around the data coming in so that we know how to process it, what's it, where's it coming from, and how do we use that for ML and AI. Those, those are things that, we previously had to do manually, so it's a direct return there.
The ophthalmology example I put forward, we're able to measure the volumetric impact of those HRF reductions, the volumetric impact of the fluid reductions, things that just were not physically possible. You couldn't do it manually. So with those deep learning algorithms, you're able to actually quantifiably measure science and biology that you couldn't do before. So I think there's already huge benefits there.
So if I can chime in on that, taking it back again, early development is, as well as research. First of all, I think Scott hit the nail on the head, that there's two kinds of gains that we get. One of them is efficiency gains, and the other is things we simply wouldn't have succeeded with otherwise. So in the efficiency gains, there's many that are easily measurable. Another example in the clinical domain is automating the draw of the, of the clinical data back from sites, which takes you from things that take 24 days to things that take half an hour, that require people, and now we don't require them at all. And now you can quantify it per trial, and you can just see your costs drop. Your costs drop, and it's faster, and it's more accurate.
That's an easy win, and it's easy to count. In the molecular realm, we are increasingly developing ways to actually track this contribution per molecule, so that we also know which are the investments that are starting to show signs that they're panning out, so that we can continue investing into them, and where are areas that might have seemed like a great idea and there's a lot of activity, but in the end, they're not really generating what we were hoping for, which is totally possible in science. You never know until you try things out. So an example like vixarelimab that I gave you is a good example, because the original financials for this asset were based on one indication. Now we have two. That changes the calculus.
Of course, it would still have to succeed, but if it does, that's a major outcome for actually a very modest investment, in this case, on the algorithm side. In terms of where we invest, we do focus on identifying areas for critical mass, because you are right, people can otherwise run all over the place with all sorts of creative ideas, and leaving enough room for some unbridled creativity, because you never know where the best idea will come from. But the places where we decide to double and triple and quadruple down are well-defined because you have to actually have a sustained investment over time to reap the big benefits. And once we build a platform, that's something that Alan alluded to, we all use it.
So, for example, anything that Prescient Design develops is available to antibody engineers in gRED just as much as it is in gRED, and in fact, is also useful for DIA projects. So it, it spans the whole of, of Roche. And that's the third aspect that I think we've been very conscious of, which is to centralize work where it is, where the skills are, and to maximize its impact in this way. So, for example, the training of large language models is centralized, and the fine-tuning, is centralized in the hands of one expert team on all of Roche data, rather than having many, many, many parallel things that would take too much financial investment, but also might not necessarily bring the payoff in the end. So we're very conscious, I think, of these pitfalls.
Harry, did we answer your question?
Yes, absolutely. Thank you very much.
Yeah. Then there's maybe one final written question, and I think then we are already done with the Q&A, and, this, Alan, might go actually to you. How do you want to ensure data integrity in an AI environment?
Yeah. No, it's a great question. And certainly, how should I say it? Very important. Let me first make a comment that comes to my mind. I think when it comes really to the efficiency, when it comes to R&D, I think certainly the major point, the holy grail, is bringing the failure rate down. I think we're making this a different industry if we were achieving this with all the tools here that we apply. Let me say, data integrity. Look, I think for me, perhaps three pillars that I look at. I think accuracy, completeness, and consistency of data when it comes to data integrity, and with that, certainly also the regulatory compliance. Why I'm not so concerned? Well, look, I think it's not like that is the first time we tackled that problem.
We do a lot of clinical trials. We deal with patient data every day, so I, I think we have a huge experience in dealing with that. And I would argue that it's also the reason why we came up with the responsible artificial intelligence framework in the company right from the beginning, where we, where we give good hints, where we really outline policies, where we make sure that people really, that use external models, yeah, are careful with data, yeah, that they put into these external models, while we have internal models where you can do that very safely. So I think we have a lot of instructions, yeah, to get into the right path here.
Certainly from a data quality point of view and whatever, I think that's in our best interest, yeah, that we have a well-structured, high-quality data that we mix, if you like, with our internal data and do this. Having said this, I think certainly we're even in that business with Flatiron. I think, as you know, we even sell e.g., data externally to other pharma companies to bring that in. So I would argue there's a high level of expertise how to ensure data integrity, and we work on that every day.
Thank you, Alan. I think with that, we are at the end of today's present session. I would like to thank the many contributors here who made this event happen. Especially thanks again to all the speakers and their respective teams for the time and the commitment, and also to the IR team members who worked on the individual slide decks. I have to call out here, Jon Kaspar Bayard and Gerard Tobin, who worked on Alan's deck and the overall management of the decks. Alina Levchuk and Birgit Masjost, who worked on Moritz's and Ken's decks, and then Lauren Kalb and Anita Tang, who worked or managed Aviv's deck, and then Jan-Philipp Schwan, who worked on Scott's deck. From the back office, thanks also to Eva Losert and Beatrice Hau.
If there are any remaining questions, then please reach out to the IR team anytime. We are happy to follow up and assist you. And with that, I would wish you a good day and hopefully talk to you soon.