Good morning. Welcome to the Jefferies Global Healthcare Conference. My name is Ryan with the Jefferies Investment Banking team. It is my great pleasure to introduce Dr. Michael Secora, CFO of Recursion. He's a scientist by background, turned investor, and now operator.
Thank you, Ryan. Thank you, everybody, for joining me here today. Happy to be joined by my colleague, Dr. David Mauro, our CMO, and big thanks to the Jefferies team for having us be part of this great conference. I'm gonna start talking to you today about the Recursion value proposition, and at its heart is a formula for the mapping and navigating of complex systems using technology. It is this formula that is often seen pervasively across technology industries. It begins with the profiling of a complex system, with the capturing of high-dimensional data to create a digital record of things, of places, of preferences in our context, perhaps biology, perhaps how chemistry is relating together. With that digitization of that physical system, we aggregate vast amounts of data, and we organize that data to create digital maps of that reality.
Then, we are applying algorithms and computational resources to navigate those maps and find novel relationships for which we then go and experimentally validate those relationships. This formula, again, you find pervasively within the technology space and is what Recursion has adapted into a life science context. Us, as life scientists, we can acknowledge that there are certain data roadblocks that can make the mapping and navigating of biology difficult. I think in our industry there is an analog standard, there is a siloing of data, so the data that we are collecting may not be as connected as it needs to be or may not be as relatable as it needs to be.
There is an irreproducibility issue for which experiments, either done at a discovery level or a development level, are not being able to be recapitulated because we're just uncertain around the standardization, the systemization of how the conditions upon which measurements are made. I'd also would say that it's these conditions, conditions like these, that perhaps give rise to the great frictions that we see in this industry, which gives rise to it costing $2 billion, it taking 10 years to bring a new drug to market, where the likelihood of success is less than 10%. So with that background, Recursion has been building and aggregating purpose-built data sets to map and navigate biology by applying that formula to the life sciences. It begins with the profiling of physical systems. We utilize our automated wet laboratories.
Here you see some images from our wet labs. We are conducting an enormous amount of experiments, up to 2.2 million wet lab experiments per week. We are then aggregating all of that data across millions of experiments, across multi-omics, phenomics, transcriptomics, other omics, across biology, across clinical data, across chemistry data. And then we are applying computational resources, such as our supercomputer, to understand how a gene can relate to another gene, a gene to a compound, a compound to another compound, and understand the vast complexity and connectivity that characterizes biological systems. This also allows us to train large language models, foundation models, which is able to nuance out these relationships. All of these tools come together in the Recursion Operating System, shown here, which helps us industrialize drug discovery and development.
It begins on the far left with patient connectivity and novelty, where we are utilizing our maps of biology and chemistry to search for, through, and find novel relationships. But it's not just in the proprietary data that we've generated. Here, I'm talking about over 50 PB of biological, chemical, and clinical data that has been aggregated, but it is also in the use of large language models to mine relationships within the public literature, for which you take those two representations, proprietary and public, and start to see: where does Recursion have data arbitrage? Where are we finding relationships that are not known or well known in the corpus of scientific literature?
Those best relationships across targets, across compounds, then go into hit target validation and go through a number of filters across chemoproteomic data layers, phenomic data layers, transcriptomic, transcriptomic data layers, as well as clinical and chemical tractability. Those best relationships then go into compound optimization, for which there is compound design at an in silico level, as well as in physico ADME experiments, and all of that data giving rise to ways that we might try to refine a compound, but also all that data going into helping, like, refine our digital chemistry models themselves. After a certain number of cycles in compound optimization, we move to translation, where we start to do in vivo mouse experiments.
This is our approach, which I'll get to a little bit later, of how to conduct in vivo experiments, looking at digital biomarkers around the measurement of certain animal models, giving rise to tolerability, toxicity, and efficacy measures, for which then we're able to take a program into IND-enabling studies and ultimately into clinical development. We have already been demonstrating how we are shifting drug discovery and development, and you can see some of the statistics shown here. There is a notable winnowing of the drug discovery funnel with the Recursion approach, ultimately trying to minimize dollar-weighted failure, drive failure early when it is cheap, and minimize when it is expensive later on in the clinic.
This gives rise to us spending less money to get to an IND compared to industry average and also highlights that we are spending less time to get to a validated lead compared to an industry average. But it's not just in economic measures. We are also able to look at how the Recursion approach is giving rise to novelty. In this depiction here, you see Recursion's proprietary data as a measure of relevance versus public data as a measure of relevance. We're able to find areas of canonical biology with well-known targets, but we're also able to prioritize a piece of parameter space for which Recursion is finding relationships that are not known or well-known in the corpus of scientific literature. This is about that data arbitrage that I spoke about earlier. How are we finding this space?
How are we exploring this space, and then how are we trying to drug targets that, again, are novel? We look to harness the value from the Recursion Operating System with a multi-pronged, capital-efficient business strategy that is across these three dimensions. It is in our internal pipeline, focused on precision oncology and rare disease, which we believe can have an accelerated path to approval or an accelerated path to value inflection. It is also in our partnership strategy, where we wade into more complex therapeutic areas such as neuroscience and other areas like cardiovascular metabolism. Lastly, it is in our data strategy, where we look to license potential data subsets of our data or some of our key tools for how we go about interrogating biology and chemistry. Now, I'm gonna take the time to kind of walk through each of these components a little bit more in depth.
Here you see our internal pipeline, and I think it already starts to highlight the scale and breadth of our approach. I'm gonna walk through each of these programs a little bit to give you some context of what the OS has been able to nuance out from biology and chemistry. First is our program in Cerebral Cavernous Malformation. This is a rare disease with no approved therapy. It is a disease that is characterized by vascular malformations in the CNS. It affects approximately 360,000 patients in the US and EU5. It is a massive rare disease.
This program already in phase II, and not just already in phase II, we're actually wrapping up the phase II, where we're looking to read out top-line data next quarter. Second program I'll call out is Neurofibromatosis Type 2. This is also a rare disease with no approved therapy. This disease characterized by benign tumors that occur in the brain. It affects approximately 33,000 patients in the US and EU5, and here we continue to advance another phase II. Watch for that preliminary data Q4 of this year. Next program that I wanna highlight is familial adenomatous polyposis. This disease is also a rare disease with no approved therapy, characterized by polyps within the GI tract that have a high risk of malignant transformation. It affects approximately 50,000 patients in the US and EU5.
Also, this phase II progressing, and you see us having preliminary data first half of next year. I'm gonna jump down to our program in advanced AXIN1 APC mutant cancers. This program affecting over 100,000 patients in the US and EU5, and like familial adenomatous polyposis, you see that preliminary data coming first half of next year. We're gonna be kicking off another phase II clinical trial, that in C. difficile infection, this affecting over 700,000 patients. Watch for that phase II trial initiation this year as well. The last two programs I wanna highlight is advanced HR-proficient cancers with target RBM39, affecting over 200,000 patients. Watch for an IND submission this year, as well as on the near term, watch for a phase I initiation.
Also for our novel target, Target Epsilon in fibrotic disease, also watch for an IND submission also in the near term. Let's take all of these programs in total. What we're seeing here is 7 programs, 7 data readouts, roughly coming in approximately an 18-month span, so a remarkable amount of clinical data coming from their OS. Also, what I think is important to call out here when I look at this pipeline in total is I see depth, I see breadth, I see maturity. I see an operating system that is able to plastically be applied across therapeutic areas, and in many cases, to nuance out perhaps first-in-disease opportunities. Let's talk a little bit about our partnership strategy as well. Here we have a number of collaborations, both with large pharma as well as large tech companies.
I'm gonna talk a little bit about the therapeutic discovery side first. We have a partnership with Roche Genentech in the area of neuroscience and one GI oncology indication. This partnership came with a meaningful upfront, $500 million tied to research milestones and data usage options, $12 billion tied to potential program options, and last year, November, I believe, we had our first program advanced and optioned, within the context of GI oncology. Our partnership with Bayer focuses on undruggable oncology. This also came with a meaningful upfront payment, as well as up to $1.5 billion tied to seven oncology programs. We're very happy to be working with both these great partners and very happy to be advancing both of these partnerships. On the technology side, I first wanna call out our partnership with NVIDIA in next-generation high-performance compute.
This partnership, which was announced last July, came with a $50 million equity investment. We've already put one of our, one of our models, Phenom-Beta, which is our phenomics-based foundation model, on NVIDIA's BioNeMo platform. NVIDIA has been working with us to design and build BioHive-2, our next-generation supercomputer, which went live last month. We have partnerships both with Tempus and Helix in the space of accessing real-world patient data. Both of these partnerships, one predominantly in oncology, the other in non-oncology, helps to give multimodal patient records where we have, of course, the clinical record, but also can be tied to the DNA sequencing and RNA sequencing, and with that multimodal record, helping to relate back to the in-cellular data that we generate in our wet laboratories to help drive causal models of, of forward and reverse genetics.
Lastly, I want to call out our partnership with Enamine in the space of chemoinformatics and chemical synthesis. We worked with Enamine to do an extraordinarily large chemoproteomic calculation last summer, looking at the protein binding interactions of approximately 36 billion compounds across the human proteome. In our data strategy, we focus on, again, the licensing of some subsets of our data and our key tools. And as I get a little bit more into the Recursion approach, I think you'll start to see some of the opportunities that could be at play for our data strategy. Let's get a little bit into the Recursion approach. We believe that to truly industrialize drug discovery, point solutions must be integrated as modules across many diverse steps.
And so as I walked us through the operating system earlier, when I look at each of these nodes, what I see here are many different modules that can almost be called as if one is looking at modular programming. And it is the application of those modular programs in a biotechnology, in a life science context, where each node can be connected to another, maybe in a looping fashion or in a sequential fashion. And so I'm now gonna unpack some of the nodes that make up these different points. The first is how we start to utilize some of the biological and chemical inputs to generate data. And as these point solutions evolve, we want to be able to increase their complexity and scale in line with what it means to understand the complex biology and chemistry.
We have conducted over 250 million phenomic experiments. We conduct up to 2.2 million experiments each week. We've done experiments in over 50 human cell types. We have an in-house library of approximately 2 million physical compounds that we utilize for screening purposes. We have been able to sequence over 1 million transcriptomes, and in service to this work, we start to move towards constructing a whole genome-wide, a whole genome-wide transcriptomic map to complement some of our, phenomic maps. We have built some large, phenomics-based foundation models based on the, our library of over 2 billion images that we have collected from our wet laboratories.
From all of this data collection, starting to understand how a gene can relate to another gene, a gene to a compound, a compound to a compound, we've been able to start to understand over 6 trillion relationships across biological and chemical contexts. We also can acknowledge that each module can be complex, and we want to continuously improve these modules. What you see here are some of the depictions of our DMPK module, as well as how we are starting to garner ADME data through experiments using this module. These ADME, this ADME data goes into helping refine the synthesis of some of our compounds in service to drugging a particular target, or also goes into helping refine some of our digital chemistry models. We also acknowledge that different modules here can come with different levels of specialization.
What you see here is video from our InVivomics modules. Here InVivomics is our approach to carrying out in vivo experiments, for which we put sensors, cameras in the cages of animals to look at heart rate, breathing rate, kinesics, appetite, hydration, so on and so forth. All of these give early reads around tolerability, toxicity, early efficacy in a more humane way. We are continuously looking to add new modules to improve the Recursion Operating System. Here you see some of the multimodal data that we've been able to garner with Tempus across clinical records, DNA sequencing, RNA sequencing, how this multi-stack goes into the formulation of causal models of AI to help drive, again, forward and reverse genetics, going back to our intracellular data that we create, that we generate in our wet laboratories.
The result is this evolving palette of sophisticated modules that go. When I look at this, again, I see modular programming, and it's because we have focused so much around the standardization, the standardization, creating effectively each module being a standard operating procedure that can be called within the context of the Recursion Operating System. We use different modules for different tasks. If we were wanting to find a novel chemical entity for a known target, then it might follow this potential template. Again, looking at it as if we are looking at a series of function calls from a piece of computer code. If we're looking to perhaps find a novel target and then drug it, it might follow a slightly different template.
In all of these, all of these potential templates can be brought together in LOWE. LOWE stands for our Large, Large Language Model Orchestrated Workflow Engine, which can put the power of the Recursion OS into the hands of both internal and external scientists in a very natural way, where there is no programming, but rather just linguistic prompts. And it can be done to do things like, give me a list of targets for a certain indication. It can also be used to find a number of targets to perhaps drug a certain indication or order certain compounds, and then try to queue up those experiments on the operating system to then collect that experimental data and advance the program more fully. So let's talk a little bit about some of the near-term milestones that we have coming. What can we watch for from Recursion?
Well, expect a number of phase II readouts to be coming. We have, again, that first readout from CCM in Q3, so next quarter. In Q4, we have a safety and preliminary efficacy readout for NF2. First half of next year, we have a safety preliminary efficacy for FAP as well as AXIN1 APC. Also, watch for later this year, a phase II to initiate in a program in C. difficile infection. And then also watch for additional INDs, first for HR-proficient cancers, RBM39, as well as watch for an IND for a program Target Epsilon, novel target in fibrotic disease. Watch for all of these programs to continue to advance. On the partnership side, watch for potential options related to partnership programs, as well as watch for potential options for some of these map-building initiatives with Roche Genentech.
There's also the potential for additional partnerships, perhaps in large and tractable areas of biology, as well as perhaps other technology partners. Also, watch for us to make some of our data and tools available to biopharma commercial customers. And lastly, watch for the Recursion Operating System to continue to evolve. You know, we believe that the technology that we have built is not complete, and that we want to continue to drive to greater autonomous discovery with more AI-based agents who can help carry out the selection of compounds, that can carry out the selection of targets, that can carry out the selection of certain experiments to validate any kind of hypothesis. Lastly, we operate from a strong financial position.
At the end of Q1, we had approximately $300 million in cash, and certainly look forward to driving tremendous value with some of the milestones that we have, potential milestones we have here, and certainly more to come. With that, I'll take a few questions from the audience. Thank you.