Hi, good afternoon, everybody. Welcome to all of you in London in the beautiful venue of the Science Museum and to everyone joining us by webcast to our second AI Day. We're really excited today to show you some interesting new things. Before that, the formalities. Here are our forward-looking statements, which you can see, and you can refer to those in the presentation if you'd like more details. We do not commit to updating these, and these are current as of today. Let's have a look at the agenda that you can see here. First of all, we will have some upfront sections from Ugur and from Karim that will talk about how AI is fully integrated into development and our entire business model in BioNTech. We'll move on to some InstaDeep examples where you're going to see lab-validated results and their applications.
We're really excited, and that will show an evolution from what you saw last year. Now I'd like to introduce to the stage Ugur Sahin, our CEO of BioNTech, and he's going to present the opening presentation on advancing a disruptive tech bio company.
Yeah, thank you, Michael. Thanks, everyone. I would like to welcome everyone also on behalf of Karim. I would like to give you the scientific biological background, why we need AI and for what we are using AI. First of all, it's a really great pleasure to be here in this place. Actually, this place here, the Science Museum, is the place where the first COVID-19 vaccines, the [anti virus], are stored here. On December 8th, Margaret Keenan was the first person on the planet who received an approved COVID-19 vaccine. This file is straight here, close to the Lancet, from Edward Jenner, who introduced the very first vaccine study worldwide. Today is about AI. This is just showing you that our AI approach is not limited to London. It's a global approach. We have sites where we do AI on multiple continents.
Actually, we have been doing AI in 2019, was the first time that we met Karim and started to work with InstaDeep. We were doing AI before that, but we didn't call that AI. It was machine learning. What is really new with InstaDeep coming in, while we were developing our tools based on existing technologies, I think what we can say about InstaDeep is really research on developing new technologies, completely new technologies. Therefore, we are not only doing research and development in pharmaceuticals, but also research and development for our AI tools. A few words about BioNTech. We are a late-stage clinical company with multiple programs in oncology, which is our core focus. We have also both pipeline in infectious diseases, particularly for diseases with high medical need, for example, TB, malaria, HIV, and others. AI is, in the meantime, really fully integrated into BioNTech.
There are only a few projects where we don't use AI approaches. The most important is that we are continuing to improve our methods and technologies with AI. I give you a little bit of background that you understand what we are doing and how AI is connected. Our core focus is oncology. Oncology is making a lot of progress in the last 20 years. One of the breakthroughs in oncology was immunotherapy, the use of the patient's immune system to fight cancer. There were a number of breakthroughs that resulted in improved survival of patients, but still, there's a huge medical need. More recently, we see new treatments based on ADCs and bispecific antibodies. Of course, we believe in the future of messenger RNA therapeutics, immunotherapies, and that could provide us additional benefit for eliminating tumor cells. Let's start with immune modulators.
The classical immune modulator, which is the most widely used category, is anti-PD-1 antibodies. The example is nivolumab or pembrolizumab, which are used, have been used in hundreds of thousands of patients. We were developing in the last years bispecific antibodies because we were interested to increase the fraction of patients who can respond to bispecific antibodies. One of the molecules that came in from a partnership with a Chinese company, Biotheus , is BNT327, pumitamig. This is a highly interesting molecule. It combines two modes of action. It's anti-PD-1, which releases immune cells which are inhibited by the tumor cells and enables immune cells to act and kill tumor cells. On the other side, it inhibits the generation of vasculature based on a mechanism which is blocking VEGF.
It has a number of additional magic tricks, which result in immune responses and objective responses in cancer patients across multiple cancer entities. This is a really exciting development in the whole field. We call this the bispecific anti-PD-1/VEGF class, which is expected not only to reach the tumor types that are currently addressed with anti-PD-1 treatments on the left side, but also can go into categories, cancer indications where anti-PD-1 treatments are not approved yet. We realized in the last 18 months this is really a big opportunity, and it's too big to do it alone. Therefore, we decided to go into a partnership and announced a few months ago a partnership with Bristol Myers Squibb. It's a global partnership to develop this class of antibody, pumitamig, in multiple cancer indications.
We have data in the meantime from more than 1,000 patients in more than 10 indications, giving us the direction which indications could benefit from that. This will not only be monotherapies, but also combination therapies. We believe that this type of treatment can help us to control tumors and in some patients also provide lasting clinical benefit, ideally cures. Cancer is very complicated. Most of the patients who have initial control progress over time. The reason is depicted here in a simple cartoon. Cancer is evolving from healthy cells by DNA mutations. These DNA mutations are just accumulating over 5 to 20 years. That means during the accumulation, the tumor cells, because these are all random mutations, generate a heterogeneity. In individual cancer, by this reason, it's really individual. No two patients share the same type of mutations. The bigger problem is even that we have an intratumoral heterogeneity.
That means every tumor cell carries another set of mutations, which means that we have a situation where cancer can evolve over time. It's an evolvement not only against the treatment, but also an evolution against the immune system. Knowing that, from the very beginning of the 1990s, where tumor immunology really became molecular, we were interested in cancer vaccines because cancer vaccines come up with the promise that we might be able to induce immune responses against multiple epitopes. If the tumor is polyclonal, the idea is here to induce a polyclonal T-cell response that goes into the different directions so that we can combine multiple antigens. This polyspecific activity is expected to remove the last million tumor cells that remain after treatment, for example, with checkpoint blockade. We have developed and pioneered several approaches to that. Two of them are shown here.
One is aiming really to completely individualize the treatment. That means it's based on identification of the mutations in individual cancer patients by sequencing. These mutations, because they are recognized by immune responses, are called neoantigens. We assemble a vaccine which is tailored to these mutations. The first description of this approach was in 2011. We showed the preclinical approach. In the meantime, we have multiple clinical trials running in various indications, including pancreatic cancer. The second approach uses a complementary concept. That means if we say certain tumors tend to have certain types of tumor antigens, we identified them. For example, in melanoma, we've identified four antigens which cover 100% of patients with melanoma. In lung cancer, we have identified eight antigens covering more than 90% of patients. It's a combination vaccine approach where the vaccine is off the shelf and then used directly for vaccination.
There is no need to generate de novo. This is the approach shown in more detail. I then take the individual sample from the patient, mapping the mutations by comparing with normal, the neoantigens prediction, which is computationally done, then on-demand mRNA manufacturing and transportation to the patient. Of course, this all is driven by data and algorithms. The manufacturing is just in time. We can deliver the vaccine in less than eight weeks. Our aim is to be able to deliver a vaccine in less than four weeks. This shows you an example of how this works. The sequencing of the tumor can yield up to thousands of mutations. In some tumors, only dozens of mutations. There are tumors with a high number of mutations, and there are tumors with a lower number.
We do the computational ranking of these mutations based on the idea that these epitopes can bind to the human leukocyte antigens, which are presenting this. The way how they bind is defined by patterns, binding patterns. This can be computationally calculated. On this calculation and many other features, for example, how much the mRNA encoding these mutations is expressed in the tumor, and whether we expect heterogeneity of the gene. Whether there is a frequency, so the fraction of tumor cells in the tumor, because of the heterogeneity, it is quite possible that you target the mutations which is only in 75% of tumor cells. You are inducing a selection of 25% of tumor cells. This is our ranking algorithm. This is based on computational approaches. We used the classical approaches.
In the meantime, we worked with InstaDeep to develop deep learning approaches, including a number of additional aspects of the antigens, for example, in which cell compartment they are expressed, which type of molecular patterns they have. It's multiple additional features which come in. I would like to give you here an example of, at the end of the day, the terminal biological mechanism that we need for this type of approaches. It's about killing, yeah, of the tumor cells by lymphocytes. This is an approach in biology in nature, which is among the most deeply quality-controlled biological events. Yeah. Because at the end of the day, it is about a cell in the body killing another cell. It needs to be authorized. The authorization is really done by a complex process. This complex process is also misused by cancer cells to avoid killing by using mechanisms that circumvent that.
If you go deeper into that, one of the key aspects here is the recognition of the tumor cell, yeah, by T cells. This is happening by the T cell receptor. That means these T cells have T cell receptors. Yeah. Every T cell has a different T cell receptor. There is a complexity here. Our biology in humans allows that. We have around 500 million different T cell receptors in our body. There is a huge library of T cells that can recognize something. Then we have the HLA molecule that is on the tumor side, yeah, which is presenting an antigen which is inside the cell. The mechanism here actually evolved to recognize hidden viruses in our body cells. It's an immune attack against infections. Yeah.
Our HLA diversity in humans has been evolved so that viruses don't circumvent the killing by just avoiding the patterns that are presented on individual HLAs. Yeah. This makes it extremely difficult. Then you have the peptide, which is a fragment of 8 to, yeah, up to 15, 16 amino acids depending on the HLA class. We have now an extremely complex interaction with the T cell receptor. Actually, these have two chains. We have the HLA, and we have the peptide. Only if the combination of all three works, we get a killing. Yeah. The problem is even bigger because the T cells have to avoid that this interaction does not kill any other cells. It's also quality controlled. This is one of the biggest challenges in AI, to identify the T cell receptors that can recognize an MHC peptide sequence. This is part of our prediction algorithm.
We know the HLA of the patient. We know the mutated epitope. We would like to understand whether there is an immune response. Nature is doing that extremely well. The COVID-19 pandemic showed us that two different COVID-19 infected patients could develop the same T cell receptors when they have the same HLA. That means this discovery of hundreds of millions of T cell receptors works very well. The question is, we have two types of questions. Can we identify T cell receptors that are recognizing tumor antigens? The second question, of course, is, can we do it better than nature does? We will have, I think, two talks about this, about these problems and how AI is used there. This is complex, and computation is important. The situation is even more complex in cancer. We have several levels of diversity and heterogeneity in cancer.
This is the cancer heterogeneity and the clonal evolution. On the other side, we have the immune system, and the immune system can also evolve. I compare that sometimes like a Go game. The immune system is playing against cancer. The situation is even more complex because other factors are also playing a role: HLA molecules, the microbiome of the patient, the environmental factors, and so on. The question is, are we going to understand if this is a game? Are we going to understand the moves of the cancer cells by reading this? We believe in that. We believe that if we feed enough data into AI, and if we really bring in the biology, we can make the evolution of a cancer cell predictable. If we know if it's predictable, then we can interfere with multi-specific approaches. We are translating that into a collaborative approach.
On the one side, we are generating data from our personalized vaccine studies, preclinical and clinical studies. On the other side, InstaDeep and colleagues are developing new tools like DeepChain. We are using these tools, of course, not only for optimization of cancer vaccines, but also optimization of proteins, mRNA structures, and so on. Karim will talk about our in-house supercomputing clusters and how this contributes to obtaining better results. In summary, this slide shows our vision. I think this is more than our vision. It's really the view of a future fully integrated AI, AI tech company, which combines a number of capabilities. Our vision in the future is that we can take clinical samples, do a personalized omics, understand what is going to happen, and use our pipeline of molecules to come up with a combination treatment, which is consisting of off-the-shelf drugs.
For example, pumitamig or our ADCs plus a personalized vaccine. In principle, we have shown that this is doable, but we need to do it at scale, and we need to do it in an affordable manner. I think this is a good introduction for Karim, who now will come and introduce the capabilities that InstaDeep has built.
Thank you so much, Ugur, for this exciting presentation. I'm Karim Beguir, and I'm the Co-Founder and CEO of InstaDeep, the AI unit of BioNTech Group. It's a very exciting time to be building in AI, and there is a lot to talk about. Very briefly, I will try to give you a sense of our approach and the opportunities that we are developing and how we work collaboratively with our BioNTech colleagues to get things done with results as well. AI, I've been working in the field for more than 10 years. If we were to summarize what's been happening since the beginning of the deep learning revolution in 2012, really, it is a triple exponential. You have data growing exponentially. Think, for example, about the cost of whole genome sequencing, which is now around $100, which is absolutely insane. Massive exponential growth of data.
It's also compute, and finally, model innovation. The compute side of things, it's pretty incredible how predictable things are. This is, for me, one of the most impressive graphs in the history of computing and machine learning more recently. If you look at since the 1940s, since everything started with the first computers like Colossus, ENIAC, and the other, and despite technology changing so much, at the time, it was vacuum tubes. Now we're in chips and semiconductors. In the future, maybe quantum computing. Yet everything is so linear in log space. Here, the y-axis is actually increasing by a factor 10 at every point. You see how this exponentially keeps going. This is also famous as Moore's law, which in general for computing is compute efficiency for a given budget. You get twice as much compute every roughly two years. In AI, it's actually Moore's law on steroids.
The amount of compute which is deployed in ML workflows doubles every four or five months. This is actually pretty insane. It doesn't stop there. The third point is model innovation because, yes, we have more and more data. Yes, we have more and more compute, or at least more affordable compute. The third point, and perhaps the least understood, is the efficiency of the models themselves. In 2012, in the beginning of the deep learning revolution, you needed to label literally millions of data points to get an algorithm to learn. You don't need to do that anymore with self-supervised learning and recent progress. You can literally feed the entire internet or entire databases and get the system to learn. Incredibly, actually, the efficiency of those models improves every eight months by a factor of two. What do I mean by that?
If you want to get to a certain level of performance, every eight months you need just half the compute to get there. Not only do we have a crazy amount of compute coming in and becoming available, a crazy amount of data, but the efficiency of the models is absolutely incredible. It is a triple exponential. Yet, if you follow what's been going on in terms of progress and the like, it does feel that progress is almost vertical. Sometimes one wonders, is this hype? Is this a bubble? Or is it true that progress in AI is going to be incredible in coming years? I would tell you, like having been training neural nets for now more than 20 years, I think that this is real. We're going to see incredible progress in coming years. Actually, there's something qualitatively different that is happening that didn't happen before.
What is that? It is that AI is so competent that it is actually now accelerating itself. If you think of AI as a plane with three engines, and these engines, like we saw, are data, compute, and model innovation, AI is so competent now that it is at the point where it's going to accelerate every one of these drivers and hence push progress much faster and deeper. This new era that Richard Sutton, one of the godfathers of AI, named this year as the era of experience, is taking us to new heights and allowing progress in a way that was simply impossible before. Here, I want to show you a little bit like what happened since the beginning of the deep learning revolution. In the very early days here, you had progress which was coming from games.
You had, with technologies such as reinforcement learning, a score, and you had a system that would learn by maximizing this score. Then we had the big ChatGPT moment. This became the golden age of large language models around 2022. Again, we're coming back to a time now where systems based on trial and error are coming back to become very effective. The reason is AI is so competent now that what used to be possible only for games becomes possible for a larger class of problems. Systems now can actually improve data. They can create synthetic data for the problem at hand and, using reinforcement learning, improve their own answer to those questions. This is what has been driving, for example, progress in reasoning systems. You could see it also, for example, in a lab to optimize a certain protein sequence.
Lab results are understood by AI that's going to use those to further improve its answer. Effectively, we're not limited by data anymore. We still need a lot of data, but systems now can generate synthetic data and take advantage of it and progress. AI now is driving data, which is the first engine. The second engine, like we said, is compute. I don't know if you noticed, but NVIDIA keeps coming with these GPUs much faster than before, roughly now a new generation every year. This is because those hardware systems are actually co-designed with AI. It's not only very clever engineers. It's also AI systems using technologies such as reinforcement learning that are really accelerating progress in hardware. This is the same for Google with its TPU V7 chips and others that are co-designed by AI. You see how AI is boosting its own hardware.
Finally, model innovation itself, one of the things which is the most amazing about this period is that we actually reached the gold medal at the International Math Olympiads, at the International Olympiads of Informatics, Computer Science this summer. To give you an idea, only a few years ago, like four years ago in 2021, we thought it would take another 20 years to get to these results. These were the experts in the field. Yet we are there. Again, technologies such as reinforcement learning are key here because you have a score. You can actually evaluate a chain of reasoning or code. This means that now AI systems can conduct machine learning research, or more precisely, work hand in hand with experts to push progress forward. This is accelerating faster. Again, technologies like reinforcement learning are becoming more and more important.
At InstaDeep, we have anticipated those trends for a long time. I'm very proud to say that we've been active in reinforcement learning research in particular for many years. This work is coming to fruition. Today, I'm very happy to let you know that at the next [NeurIPS] conference, which will take place in December 2025, [NeurIPS] is the largest and most influential machine learning and AI conference in the world. We actually have multiple RL papers accepted, including for the first time in our history, an oral accept on top of a spotlight accept. Really, congratulations for the team for pushing the envelope in terms of algorithmic innovation in RL. It's really exciting. It doesn't stop here. What's exciting also is that we've had the most productive 12 months in InstaDeep history.
If we look at where we were at the last AI Day to this day, we've had actually six Nature Journal publications on biology and AI. For me, this is a testimony of the quality of the innovation that is taking place between BioNTech and InstaDeep. This is collaborative work between our different teams. Really, like super, super exciting, including for the first time in our history, a cover of Nature Machine Intelligence in June. As you can see, we've been having fun in R&D and innovation. In reality, you need a lot more than that to win in AI today. Like Ugur mentioned, you need a full integrated approach. We need to be competitive at every step in the process. What does it mean?
If you think about what I was saying earlier about the three engines of AI or the three engines of the AI plane, you kind of see them. It means being excellent in compute and model scaling. You need to train those models at incredible large size. We're talking about like hundreds of billions of parameters, trillion levels of parameters. You need also the AI innovation, model innovation, which we discussed. You need also to have a great data strategy and an ability also to use AI to accelerate your data acquisition. If you do these three things, you get to exciting applications. This is exactly what we're going to cover in this order. Starting with compute and model scaling with Alex, who's going to present our latest results.
Please take. Hi everyone. It's a pleasure to be here. I'm Alex. I'm the Head of AI Research at InstaDeep. Echoing what Karim said, we had indeed a very interesting summer, a very exciting one. We've seen that AI has made the headlines with major accomplishments, achieving gold medal at the International Math Olympiads, winning programming contests with a generally capable model, and even now stepping into the real world with advances in robotics and physical intelligence. I would say in my view, these are not isolated milestones. These are predictable outcomes of the scaling laws. The scaling laws, let's say, are a numerical law stating that the performance of a modern AI system is a predictable function of the resources spent to train such systems, being data, time, compute, memory, and so on.
It's not only a single scaling law nowadays, actually. There are the pre-training scaling laws, as we know, but also post-training, which is really enabling agents to actually interact with an environment, being simulated or real, and learning from these interactions to accomplish even greater tasks. We also have the test time inference scaling laws, which state that you can spend more compute to refine and polish the result of what the AI system will produce. The question for us is, as a company, how do we position ourselves there to perform in this new environment? The philosophy of InstaDeep has always been to build an integrated AI ecosystem, starting from the hardware, going to the orchestration and software. It's our belief that it's only through a tight hardware-software integration that we can gain the performances, the cost efficiency, and the control required to achieve our objective.
How does it work in practice? Last year, for this reason, we had the pleasure to announce that we built the Kyber cluster, which is an AI supercomputer made of NVIDIA H100. This contained 14 of these racks, which have been engineered in-house by our bare metal team to optimize the performances for our own AI workflow. For example, scaling large organ models training or running simulation for RL training, like Karim mentioned. It brings our total compute capacity to 500 petaflops and is now our major source of compute power for the company. Now that we have this hardware, which is critical for our work, we need to make it very easily accessible by all engineers to umpire the work. There also, we built our own product, our own platform, which is called Alchor .
It's a full product available to our customer and really enables us to run very seamlessly experiments on Kyber. With just a GitHub process, Git commits, we can run experiments very easily. That's why our engineers, around 200, 300 of our engineers, are actually submitting more than 15,000 experiments a month on average in 2025. We also keep our GPUs and hardware very busy, where we maintain a very high usage of 75% of GPU usage on our cluster. The next step in building on top of that is obviously the ML software stack that we have to design to, let's say, squeeze the most performance out of each hardware accelerator. That's why we've been building an entire ML ecosystem that is meant to be very efficient, scalable, modular, such that we can answer the requirements of the research development in which we operate.
Let me give you two examples of this in action. The first one is about scaling large organ models. LLMs are part of our daily life, and sometimes we train them, quite often actually. Here, we took the challenge of trying to scale our nucleotide transform models, which we published in Nature Methods later last year, to a 100 billion parameters model. The first challenge here is that a 100 billion parameters model does not hold on a single device, on a single GPU. The first thing we have to solve is actually how do we distribute it? How do we shard it across GPUs? Our answer here is to use fully sharded data parallelism within a single DGX, within eight GPUs, and then horizontally scale that across all the racks of Kyber. Here I just depict three, but we have more than that. Horizontally scale that using data parallelism.
If we were to grow our model even more, we could use tensor parallelism. We could use pipeline parallelism or even like sequence parallelism if we want to handle a very, very long context length. That's the first point. The second one, at the code level, we have to do a lot of optimization as well. We can use advanced CUDA kernels like Flash Attention. We can do mixed precision quantization, or we can try to optimize the XLA compiler and use a better network configuration and so on and so forth. The result is a staggering 66% model flop utilization. Just the definition means that basically we maintain our hardware busy, 66% of the theoretical limits of the hardware. Right. Just to give you a reference point, the large public run of Llama 3.1, which contains like 400 billion parameters, the MFU was around 40%.
Of course, it's a much larger network. It's run on thousands of GPUs, which will give you a sense of the meaning of that number and how high it is. We're actually going to talk a lot more about the foundational model we built in the next section in a few minutes about our AI innovation. The second example I want to give you is about scientific computing. Traditionally, when scientists are trying to discover, look for a new molecule, a new drug, and chemicals, materials, they start with thousands of candidates. Realistically, only a few of them can be tested in the wet lab. Right. Scientists face not only this kind of discovery problem, but this smart selection problem. The problem is that if you choose the wrong candidate among the many potential, you waste time and resources. How can we do a smart selection here?
Lucky for us, most of this property can actually be accurately estimated using quantum chemistry. The problem here is that quantum chemistry is extremely slow. It's accurate, but slow. On the other hand of the spectrum, you have classical force fields that are extremely fast, but really prone to errors. How do we handle this? Our objective has been trying to combine the best of both worlds: the quantum level accuracy, but order of magnitude faster. Our answer to achieve that is MLIP, machine learning interatomic potential. These are a class of machine learning models that are trained on quantum chemistry data, so very accurate data, but that run much faster. The result is indeed very impressive. In terms of accuracy, we see that there's a near perfect correlation between MLIP and the reference DFT calculation, energy level, whereas classical force fields are prone to error, as you can see.
It's an accurate one. Second, it's also much, much faster and cheaper, actually up to 10,000 x cheaper. For any dollar spent on MLIP, you have to spend more than $10,000 of classic DFT calculation. It's usually a huge improvement. In addition to that, as opposed to classic quantum chemistry methods that don't scale very well, MLIP does. You can run simulations on tens of thousands, not hundreds of thousands of atoms very efficiently. We're very excited about this technology. It's the early days, but we're excited about the potential because of its application in so many different domains and applications of interest for us. I invite you to keep a look on that, and we have a booth downstairs about it.
I hope that gives you a sense of what we've been doing at InstaDeep in terms of developing your AI stack, going from the hardware level with Kyber, the orchestration level with AI chor in our product there, and the machine learning software stack. Now I want to give some space for Bernardo, who is going to describe how we've been using this stack to develop the next generation of foundational genomics models. Bernardo.
Thank you. Thank you, thank you, Alex. Hello everyone. It's a pleasure to be here. I'm Bernardo, a Senior Research Scientist at InstaDeep. It's my pleasure to present our work on AI applied to genomics. Genomics is a study of our genome, our genes, and how they play together in our cells. I want to show you how we are using AI to understand that. The first thing we did last year was to publish our first model, our foundation model for genomics called Nucleotide Transformer. Since then, it has become one of the most popular genomics AI models in the field and is used in many papers and to develop many new models.
On our side, we have used Nucleotide Transformer here on the left to develop new iterations or kind of fine-tuned versions for different applications that were published over this year, where we used Nucleotide Transformer to annotate the genome at single nucleotide resolution with SegmentNT, a second model. We built Isoformer that combines DNA, RNA, and proteins to perform at different tasks. We even combined Nucleotide Transformer with a conversational agent, which made the cover of Nature Machine Intelligence with ChatNT. All this is published, but we are already working beyond this. If I put into perspective the current models that exist in the field, nowadays we have models that learn from genomes. Nucleotide Transformer is an example. EVU as well. They are just trained on genomes and then fine-tuned on different downstream tasks.
On the other side, we have models that learn from functional data, like Borzoi or AlphaGenome from Google. Today we are very proud to announce the release of our new Nucleotide Transformer version, NT v3, where we try to unify both paradigms into a single model that learns from genomes, in this case, more than 150,000 species genomes, but also at the same time is post-trained on thousands of functional data from many different experiments across different organisms. What is NTv 3? NT v3 combines a full set of capabilities, being multi-species, but also multimodal, going from genomes to functional tracks, genome annotation, all at once. It goes from human genomics to plants and metagenomics. It's now capable to process sequences of 1 million nucleotides, the longest that exists nowadays. It's also generative.
It can design DNA sequences with de novo properties, and I will show you some validations in the lab as well. We built a suite of models from small 10 million, very affordable, to 4 billion parameter models, and it's also designed for efficiency, even with this long context and model size. Now we'll dive into the details of NT v3, starting by the main pre-training phase. We take[ NT v3, and we pre-train it on more than 150,000 species genomes. That's about 8 trillion nucleotides. We do it in different phases from short to longer sequence lengths to cover the whole tree of life, from very small virus plasmid sequences to human genome, of almost a million nucleotide sequences. All these through 15 trillion tokens, the longest pre-training existing in genomics. You can do this using a masked language modeling objective where you perturb the sequence.
For example, if you mask 15% of the nucleotides, and you ask the model to reconstruct it. If you do it over and over again, the training loss or the error at this objective kind of starts going down. The model starts learning this objective. By doing this at different model scales, we can really see the scaling laws of AI in action. Our smallest model gets this performance, and the bigger the model, up to 4 billion, gets even better. We have these different models of different sizes, different efficiencies that you can use now for various applications. Starting with some inference time, just to show you how efficient our model is, this is the current set of models in the field.
When you compare across different sequence lengths up to 1 million, the efficiency in terms of inference time, you can see that they all suffer, and it's very hard to scale these models to long sequences. That's a common issue in the field. NT v3 was designed for efficiency for this problem. With our three here, our three NT v3 models, you can go up to 1 million nucleotide sequences with a minimal loss in terms of inference time. It's very, very affordable and possible to use for downstream tasks at this scale. We tested this NT v3] on the first set of tasks, a long range, around 44 tasks that go from gene expression, chromatin accessibility, genome annotation across various human tissues.
This is just to show you kind of a busy plot of all the tasks we have been compared NT v3 against other competitive models. If I summarize all this information and group it by quantitative and classification tasks, we can observe that our models are better than the competitive models, and particularly our small model, just 10 million parameters, and it's already very efficient. That's the main message, the first main message. Very good small foundation model, very easy to use. If you scale the model size, you can see that you get performance on both types of tasks. Larger models, better performance. That's on a set of kind of downstream tasks that are already useful for people.
We wanted to take a step further and bring all this functional data of genomic tracks and genome annotation into an additional post-training phase. We take our NT v3 model and post-train it on genome annotation and genomic tracks experimental data. This means that we take for a few sets of species, all the introns, exons, splice sites, all these elements that matter in the genome. We try to use NT v3 to predict them from the sequence. At the same time, NT v3 needs to predict all this experimental data, so around 17,000 experiments from 16 animals and six plant species at single nucleotide resolution. These are kind of example profiles that NT v3 needs to predict, and doing this with sequences up to 1 million nucleotides long.
We do the post-training on all this data, and then we can show you how we perform with NT v3 on genome annotation and genomic tracks experimental data. We start with genomic tracks. Just to give you an idea of what the actual predictions look like, this is a piece of our genome with two different genes. This is a 1 million nucleotide window. Here I'm showing experimental data from K562 leukemia cells. At the top, you have the experimental data, for example, from RNA-seq, DNA-seq, and other different assays. At the bottom, you see the NT v3 predictions. In one go, the NT v3 can predict for a 1 million sequence, single nucleotide profiles that match very well the experimental performance. You see with NT v3, you can predict and recapitulate these assays. This is an example for two genes.
If we now look across the genome and just compare with the state-of-the-art model, the Borzoi model, across these different experimental readouts in human and mouse, we are showing an improvement over the state-of-the-art across all of them. We are outperforming the current models on this single nucleotide prediction of experimental data from human and mouse cells. That's on genomic tracks. We can also evaluate now our model on genome annotation. Again, this is a busy plot, but that's how our genome looks like. We have a 1 million window with many genes at the top, and our model has to predict all these different elements: where is the gene, the intron, the exon, splice sites, and all these elements have different resolutions. I'm again showing you the actual annotation with the predictions of NT v3.
If we zoom in, to be easier, if we zoom in into a gene, now we can see a kind of a better pattern of the gene with all the exons and the introns in these lines. We can see that NT v3 predicts that indeed it's a gene, the locations of all the introns, and the locations of all the exons, and even the splice sites, which are just one nucleotide out of this 1 million context window. The same for the UTR regions. Very rich predictions from NT v3. They look like the actual annotation. We can again summarize the performance and compare with the state-of-the-art model, SegmentNT. The percentage improvement across all these different elements, so 14 elements that we train NT v3, again showing that we outperform the current state-of-the-art on gene finding, regulatory elements like promoters and enhancers, and also splice sites.
We take the pre-training, learn from genomes. We take the post-training, learn from functional data, and we outperform the state-of-the-art models there. We can even now bring the model further. Instead of just being predictive, like previous models, the previous version of NT and the Informer, we want to bring this model to the generative space as well. Nowadays we have models like EVU that are generative, but we don't have models that do the two. NT v3 is the first model that can do the two in one go. NT v3 learns from these native predictions and functional data, but can also do de novo and conditional sequence generation. That's thanks to the mask discrete diffusion framework that we implemented in NT v3, where you can guide NT v3 to generate sequences with a given property.
We are unifying representation capabilities with these fine-tuning approaches with generative capabilities. I want to demonstrate this using an example that we actually took into the lab and validated. We collaborated with the researchers from Vienna from the IMP Institute to design enhancers that are promoter-specific. They activate specific genes. Enhancers are sequence elements that modulate the expression of genes. They can be very useful for gene therapy to activate genes in different cell types. We wanted to design enhancers specific for promoters, but that are active at different levels as well. We took NT v3 and with this mass diffusion approach, made it generative and generated enhancers for different tasks I will show you after and validated them in the lab through reporter assays. The first experiment that we did was to prompt NT v3 to design enhancers with different activity.
You take a gene of interest and you want to design an enhancer that activates the gene with low, medium, high levels. We trained NT v3 to do that, generated a few sequences in the computer, sent them to the lab. They generated the sequences and added them into cells in a reporter assay. In this plot, I'm showing the experimental results. In gray is the native enhancers from the cells. You see that you have enhancers that activate the gene at low levels, medium, and high levels. I'm very happy to say that when we tested the generated NT v3 enhancers, we observed the same kind of phenomenon. Our prompted enhancers for low activity were indeed lowly active, activated the gene less. We could also design enhancers that activated the gene even stronger than the native enhancers.
This was a success in terms of generating enhancers that activate genes at different levels, again validated in the lab. This was the first experiment. The second one was to design enhancers that activate specific genes, specific promoters. You prompt NT v3, for example, with a high activity in one promoter and low activity in the other. We tested in the lab the activity of the two promoters with the same enhancer. Here I will show you the fold change between the prompted high active promoter and the low active promoter. You want high fold changes, so high specificity. We tested two different promoters. This is a DSCP, the DSCP gene. Compared with the state-of-the-art generative model from using DeepStar, we observe a stronger specificity for the DSCP gene and an even stronger difference also for this RPS12 gene.
This is showing that our models can design highly specific enhancers towards specific genes. In gene therapy, for example, this can be very promising. These were two experiments validated in the lab. I just want to come back again to the whole presentation and the different key points that I mentioned today. NTv3 can be used to predict experimental data that we call genomic tracks from different cells. Think about gene expression, chromatin accessibility, et cetera. It can be used to predict the annotation of genomes and can be applied across different species, for example, genes, splice sites, et cetera. By predicting all these properties, we can now infer or interrogate NTv3 to predict the impact of variants on all these different properties. You can even bring this further and generate sequences with specific properties like enhancers in this case.
Very, very happy, I think, for this milestone to present here today this NTv3. With this, thanks a lot. Thank you so much, Karim .
Thank you so much, Bernardo. I really want to congratulate you, Thomas, and the entire NT team. NT v3 is a breakthrough. It's so extraordinary to see the team getting to build the largest context window in genomics today, state-of-the-art performance, an order of magnitude faster inference than anybody else in the field, and all this at very reasonable budgets. If we compare to Frontier Labs, it's really a testimony to the incredible innovation happening at InstaDeep and BioNTech. It doesn't stop here. We've just shown you state-of-the-art lab-validated results in genomics. We are also very active in the protein space, and we're going to have Bora introduce our latest cutting-edge results in protein design. Bora.
Perfect. Thank you very much, Karim. Hi, everyone. My name is Bora. I'm a Research Scientist here at InstaDeep. Today, I'd like to take a little bit of time to talk to you about our use of GenAI for protein and specifically antibody engineering. I'd like to start by taking a few seconds just to set the scene. When we are normally designing a protein, we're not just designing for one property. We're actually optimizing multiple properties all at once. The solution essentially needs to satisfy multiple constraints. The traditional way to approach this would be to develop n models for n different tasks and then apply one after the other. The problem with this is that it's very, very inflexible.
If the task at some point should change, maybe your internal pipeline, the actual experimental pipeline changes, then you need to go all the way back to scratch, develop new models, curate new data, and so on and so forth. We want to flip this on its head a little bit. What we envision is essentially just one big model that has been trained with as much of the data of interest as possible and is aware of all of these things. Essentially, it's learned a very rich joint distribution over all of the different attributes that we care about. That means that at inference time, the scientist using this model and interacting with it can prompt the model specifically with only the things that they care about. One model essentially becomes all of these previously mentioned models.
Another advantage here is that because you're training the model with lots of data, the model can also learn correlations that were previously invisible. That drives up performance. We spent a lot of time thinking about what sort of model, what sort of architecture, ML paradigm is the thing to go with here. We ended up using Bayesian flow networks. These are very well suited to different types of data which we encounter in scientific settings. We first started by publishing a proof of concept paper where we introduced our models ProtBFN and AbBFN. These are sequence-only models. We actually showed that compared to leading autoaggressive models, BERT-style transformers, and diffusion models, they outperform them in terms of both sequence naturalness, diversity, and all the things that we care about. Today, I'd like to take this a little step further and introduce AbBFN2.
AbBFN2 is our first truly multimodal antibody design model. It allows a scientist essentially to flexibly interact with the model, design antibodies for any task that they're interested in, and optimize them on multiple fronts. When I say antibodies, what I'm really referring to in this case is the FV region. That is made up of these two chains, the heavy chain and the light chain. It's actually part of the larger molecule. The reason why we focus on these FV regions is because essentially in the past years, we've seen a massive, massive expansion in the different formats of antibody-based therapeutics. You've got your kind of standard IgG molecules, but also antibody drug conjugates, bispecifics, slightly more esoteric, novel versions of bispecifics or multi-specifics, and so on and so forth.
The one thing that's common to all of these things is that the key recognition of the antigen happens via an FV. That's why we need to model this. We need to model it very, very accurately. The problem is further made even more complex because FVs are highly, highly diverse. A very, very conservative estimate would be that there's more than 10^16 possible naive antibody sequences, as we call them, which makes this a massive needle-in-a-haystack problem. The issue is also that antibodies are weird molecules. Normally, a protein is expressed from one single gene, whereas for an antibody, five different random genes are essentially spliced together to produce the molecule. The biophysics of the molecules are also very interesting. That means that your haystack is now huge, but it's also multidimensional.
You really need fine-grained control over the generative process to actually pick something out from here that works for your purposes. That's what AbBFN2 does. I'm not going to bore you with the details of these things, but this is essentially 45+ different modalities or attributes of an antibody that the model includes explicitly. Any design task that we can express in terms of these modalities, the model can tackle. If we don't care about one property at one time, it doesn't matter. We just ignore it and focus on the other ones. This includes stuff like the genetics of the antibody, the biophysics of the antibody, but also the sequence. We're constantly developing new capabilities. We now can do per-residue energetics to stabilize an antibody. We also look at things like germline families and also genetic information at the residue level.
We're also working on including structure of both the antibody and the antigen, as well as quantum accuracy energetics. A couple of results here. The first thing that we do is essentially use the model to label known sequences. That is, I have an antibody sequence. I want you to tell me everything there is about the sequence. I want you to label it very accurately. Here we've tried 23 different tasks, and we find that AbBFN2 outperforms every other baseline that we've tested on all of these tasks, sometimes by a very large margin. This is very nice because it essentially means that the model has really learned the relationship between sequence and metadata or attributes. It also means that practically the model is essentially a one-stop labeling tool.
Rather than using five, ten different tools, all of which have different software requirements, you can just put your sequence through AbBFN2 and get all of the information about it that you care about. It also means that we can tackle the inverse problem. That means I have a specific requirement and I want to design an antibody that satisfies that requirement. As an example here, I've chosen to show you some stabilization results. Stabilization of an antibody here refers to the interface of the heavy and the light chain. This is where they bind together to each other. This is really, really, really important, both in the clinic, but also naturally. If an antibody is very stably bound to its kind of paired chain, then that means that it's more stable, which means it's easier to express in large quantities, which brings down costs.
It also means that it's easier to store, and it's just generally something that we're interested in. This is also specifically very important in the case of bispecific antibodies because there you really need fine-grained control over which part, which chains will pair up with each other. Last year, we were able to essentially recapitulate natural interface stability. These are interface stabilities that you would expect to see in natural immune repertoires, so sequences that come from actual human immune systems. This year, we've pushed this even further, and we can now actually arbitrarily set the energy that we want. We can tune essentially the stability of a given heavy light chain pair. Another thing that we're interested in is multi-parameter optimization. This is you have five different properties or ten different properties that you all want optimized.
As I said, traditionally, you would use five or ten different tools, one after the other. The problem is that these tools are unaware of each other. They might undo each other's effects, so to speak, and also, they will introduce more mutations than are strictly necessary. In our case, we make use of AbBFN2's capability to understand all of the attributes all at once. We also make use of inference time compute scaling. We tell the model, here is the starting sequence, here are the five things that I want you to optimize, so bring into those blue regions. Essentially, we allow the model to think about its response, edit it here and there, and make changes progressively. We see really, really nice results with this. When we look at all of the antibodies that we've tested here, we have an 80% success rate.
If we actually look at only the antibodies that you would in the first place take a little bit further during preclinical development, so the tractable ones, the success rate shoots up to more than 90%. The very, very interesting results in this case is that the number of mutations for one objective, for instance, is at 46.6. This is roughly in line with experimental approaches to doing this. When we add four extra objectives that we optimize for, we actually only need ten more mutations. The model is really aware of if I make this change, this actually satisfies multiple things at once, so this is the best one to choose. Part of this is also sequence humanization or essentially reducing the risk of an adverse immune response of a sequence.
Traditionally, again, with a purely experimental approach, this is often done in a kind of trial and error way. You take your starting sequence, you introduce a few mutations, you check that everything still works, you do that again. If something breaks, you revert back to a previous state, do that again over and over until you essentially find your idealized candidate. This can take a very long time, but you might also at some point during this process realize, oh, this antibody was never going to be optimized. What we want to do is essentially integrate models like AbBFN2 into the experimental workflow. Rather than having this iterative approach, we essentially use AbBFN2 to optimize the immunogenicity risk. This takes 20 minutes. Afterwards, you can still do all of the things that you were going to do, including affinity optimization.
This really is as easy as I make it sound because we've also ensured that the model is usable. Right. We've p ackaged the model. It's now available on DeepChain, and we've essentially made sure that certain workflows that people might be interested in are easily accessible.
In this case, for instance, we can do conditional generation where I have certain attributes that I want in an antibody. I could say, for instance, oh, you know, I have this specific CDRH3, so loop length in mind. I have the light chain sequence already, and I have most of the heavy chain. I want you to just generate me the rest of the heavy chain. Generate me a library that I can then take forward. Alternatively, for the humanization workflow, we've actually packaged this as well. In this case, all we need to do is enter the sequences that we're interested in, set essentially how many times we want the model to iterate on these sequences, and then press go.
To save us the time here, I've actually pre-run one of these humanization experiments, and you can see here that the input sequence is given, and you can see as the model works its magic, changes are made progressively, and over time, the humanness increases. We can also then, for instance, scroll down and check that the sequence still folds up in the same way, so nothing has been disrupted. This is really just to make life easier for the bench scientist using the model. With that, I'll take you back to the slides because we've actually tested these things in the lab. In silico results are well and good, but you always need to demonstrate that these things work. In our case, we've taken four antibodies. These are clinical stage antibodies against four diverse targets, and these are antibodies that have actually undergone a humanization procedure experimentally.
We've also done this with AbBFN2 and tested that they still bind. In all of these cases, the antibodies still bind with good affinity, but what's really remarkable is in most of them, we actually need far fewer mutations, which allows you much more space to then do further optimization according to your needs, be that what it may be. This is really, really exciting. We've done the work on a computer, essentially, and we can show that it works in the lab. With that, I just want to pull it back and say that the aim of the model is really to integrate into pre-existing workflows. No one should have to change their experimental workflows to fit the way a model works, but rather, the model should be able to fit to your needs.
This is really possible with AbBFN's, as we like to call it, condition anywhere, generate anywhere paradigm. With that, I'd like to thank you all for listening and hand back to Karim.
Thanks, Bora. It's really exciting to see the progress on our Bayesian flow network models. As you can see, I think one of the differences with last year is this time we have lab-validated results. You saw that for our Nucleotide Transformer. You're now seeing it for our generative protein models. We are really focused on having an impact. Where are we now in this presentation? We passed the halfway point. As you have seen, we've been looking at compute, our Kyber cluster results on scientific computing. We looked at algorithmic innovation. Now we're going to get closer and closer to applications. A specific point, which is extremely important, is working hand in hand with our BioNTech colleagues on the data front, making sure we can extract as much insight as possible from the data.
In this context, I'm very happy to introduce Nicolás, the Head of our BioNTech AI Team, as well as Youssef, to tell us more about the work we're doing in data.
Hello. Pleasure to be here. My name is Nicolás. I am the Head of the BioNTech AI Team at InstaDeep. Hi, Youssef.
Hi, I'm Youssef, and I'm a Machine Learning Engineer at InstaDeep.
Basically, BioNTech AI strategy is quite simple. As Ugur mentioned, it's driven by data. There is always potential to continuous improvement of our algorithms. The more and more we generate data, we will show that in the context of the personalized vaccines. This is all across the company. We are also aiming to learn as much as possible from the tumor. This is where the information is, and this is where we need to develop algorithms to leverage as much as more this information for the design of effective vaccines. We would like to walk you through two examples of how we are designing AI algorithms and tools to actually learn from the tumor and learn from the data itself. One on the sequence space and one on the image space. First, let's talk about the sequence space.
For that, I want to introduce you the concept of the dark proteome. The dark proteome encompasses uncharacterized proteins from hidden translation products beyond the canonical proteins and known PTNs. Those proteins that come from protein coding genes, traditional classical protein coding genes. There is a whole new world of proteins or peptides that are not born the same way. They come from aberrant splicing events or gene fusions or long non-coding RNA sequences or non-canonical open reading frames. How can we look at this? We wish we had like a sort of lantern to illuminate the dark proteome. For this, we developed InstaNovo, a tool that does peptide sequencing, de novo peptide sequencing, library-free peptide sequencing. I will tell you why this is very important. Sequencing peptides is very complex, right? It's not as simple as sequencing DNA.
You need to chop your peptide into pieces, into fragments, and then accelerate those fragments in a magnetic field. These fragments have a mass, a charge, so they give a trajectory. We end up having a spectra like the one you see here, the MS2 spectra here. In traditional mass spectrometry, what you do is you will have a library, a reference library, where you really know what you are looking for for canonical human protein. That is easy. For de novo peptides, for non-canonical peptides, that's a bit more complex for the dark proteome. Once you have the library, you do a database search. You try to match this spectra with your library to finally get, in this case, your known dark tumor antigens. What InstaNovo solves is the problem of having this library, which we don't really know in the context of these non-canonical peptides.
Another interesting thing is that these peptides could be very cancer specific. They are great for designing targets, new targets, target discovery, or biomarkers for cancer. Ugur said that in the end, the cancer fighting is cells, like your immune system fighting your cancer cells. You want to kill the ones that are cancerogenous, right? Your target needs to be cancer specific. Just to give you an idea of how we are using InstaNovo here, we see a table here where we have tumor and normal identifications. We find a few peptides where you see that the number of tumor identifications is much larger. The output of these peptides comes from InstaNovo. It has already shown this potential in detecting tumor specific epitopes from these undocumented open reading frames.
InstaNovo has been published in Nature Machine Intelligence, and we made it available for the whole community to use it and try it. It has been also covered by Science Magazine on an article of next generation de novo peptide sequencing. This is work that has been done in collaboration with Professor Tim Jenkins at DTU. We are extending this collaboration for introducing InstaNovo V2, an even larger model, 63 million label spectra, where you see the increase in the peptide spectrum matches. It has a higher accuracy, like 10%- 15% increase in accuracy in the data set that we have been testing in it. We are very excited to apply it in BioNTech for the discovery of new targets and biomarkers, cancer specific targets and biomarkers.
With this, I would like to leave the place to Youssef to show us a bit of how we are trying to improve our digital pathology algorithms.
Thanks. Thank you, Nicolás. Hi, everyone. Last year, we showed our AI-assisted annotation tool and how we increased the efficiency of pathologists five fold. However, five times faster pathologists is still not enough because we have thousands of whole slide images to annotate. The question we had to answer is, how can we reduce the pathologist's annotation efforts while ensuring the best model performance? The answer to this is data. In computer vision, usually, when you look at your data, when it's unlabeled, it's different points, like you see here. What we do usually is that we take random points from your data to use it for the model training. This works when you have a lot of data, thousands or millions of data points you can label.
When you have few data points and we want to reduce, actually, the pathologist's efforts, and you take your data and you plot it, for example, in a t-SNE graph, like this one, for example, it's a real t-SNE graph of a data set. You will see that your data points are not covering all of the patterns. Here, each cluster is actually a different pattern in your data set. You will be missing the highlighted patterns here, for example. When you test your model after that, you are not sure you will be getting good results in these patterns because the model didn't see them. What you want, actually, is that you cover all of your patterns and you don't have to have a lot of data to label. For that, we actually took the leading open source software in the data curation and the histopathology visualization.
We built our own internal product on that, which helped us to explore, understand, and work with our histopathology data. Here, I will show a demo for that. What you are seeing here is actually the real clusters of data. This is the CRC 100K data set, for example. When you look at one of the clusters, here, for example, I guess it's the tumor, you will see the same pattern there. When you go on the other side, this one, I think it's the adipose or the fat cells. Yeah. You see a totally different pattern. For this data set, we have the ground truth labels. If we visualize the labels here, you will see that actually the foundation model is doing really well in clustering the data set. You can see here that for each different clusters, you have specific colors.
For example, the yellow one is debris and the green one is the tumor, for example. It's doing even better because for the tumor, for example, here, you see that you have a lot of different clusters. If you take this part from the tumor here, a specific pattern, and you take another part here, it will give you a totally different pattern in the tumor. We have even subclasses for each class. It doesn't only work on these patches. What we made, also, we made it to work on whole slide images. You can take a cohort, for example, for the task of the MSI, MSS. You can see all your whole slide images. You can also see their embeddings and their t-SNE graph. Here, we fine-tuned a little bit the model on the task itself.
When we visualize the label, you can actually see that the MSI high, most of them are grouped together. You have the MSI low and the MSS as other ones. You can also see your data to find the outliers and the most unique ones. We can visualize the uniqueness here. The brightest the point, it's the more unique. Here, for example, if we take this point and we investigate it, let me open this one here. We can also investigate the whole slide images inside the app. When we zoom in, actually, here, we find it's the most unique because it's out of focus. That's how we can find the outliers or the wrong data. It's actually the focus is on the marker made by the pathologist and not on the cells themselves.
Another feature you can do here, if you can go back to the presentation, one of the features, also, you can actually see your whole slide images and you can zoom up to the cellular level to investigate them. We actually built a nice module on there where you can test different AI agents from different providers. For example, here, we are just testing a MedGemma developed by Google DeepMind. We want to see its answer to the question. You can select a region and then you can ask the agent. For example, here, we are asking if it can confirm the presence of invasive colorectal cancer in the image. Here, you get the response. Yes, it confirms that. You can also give it a try after that in the booth after the presentation.
Thank you very much. You see how we are empowering digital pathologists at BioNTech with these tools. Yes, you are more than welcome to give it a try downstairs soon. Now, we leave Karim for more applications in AI at BioNTech.
Thank you, guys. Thanks, Nicolás, and thanks, Youssef. It was really exciting to see the progress we're making in terms of improving the data quality that we have and also quantity. If we summarize, if you remember at the beginning, we said we have three engines that are powering the AI plane. The first one is compute, and that's what we saw with Alex. We looked at AI innovation with Bernardo and Bora. Finally, now on the data front with Nicolás, Youssef, and the BioNTech AI team. This is all very nice, you could tell. What can we do with all this? What is really exciting with having all those capabilities under the same roof at InstaDeep and BioNTech is that we can start now to tackle truly hard biotech problems.
Today, we're going to show you our first results in terms of applications, starting with nanoparticle design with Lexi and Chang.
Hi, everyone. My name is Lexi. I am a Scientist at BioNTech, and I've been working together with Cheng for the last year. One thing that we're really interested in is how to develop the best vaccine. In order to do this, we look at first, what is our immune system trained to respond to? Oftentimes, that is viruses and bacteria. These viruses are large, and they have a highly repetitive surface. Sometimes that surface is symmetrical. What we can also do is look at what have some historically successful vaccines looked at. They've actually taken advantage and harnessed this capability of having something that is large, something that has a repetitive system on the surface, and is symmetrical. Some examples of this include the hepatitis B vaccine against the hepatitis B virus, the human papillomavirus vaccine that helps against cancer, and more recently, with a malaria vaccine.
All of these really harness what our immune system is trained to respond to. They have an antigen on their surface in this large repetitive manner. We would like to combine with InstaDeep to be able to do this from scratch using AI-assisted de novo protein design. That's not the only thing that we want to do with this innovation, as we also want to marry this together with the power of mRNA technology, which has been so, so successful for many vaccines. Now, what does this look like practically? What this looks like is we would like to deliver mRNA and utilize the cell to build our nanoparticles from scratch. This begins by starting with a single protein component that must first find its friends and velcro to three other components of the same thing in a really oriented way.
Once they have found these friends, they need to continue to assemble into up to 20 of these three-mers coming together to form these beautiful repetitive arrays to form a nanoparticle vaccine. This nanoparticle vaccine will eventually hold antigens of interest that we want to tailor to our specific vaccine of choice. What can this look like? We want to be able to design not just one of these, but ideally, we would have a library of these tools that are tailored and fine-tuned to the application at hand. Here's just an image showing how many of these nanoparticle designs that we want to be able to build and bring to life.
To really drive home how complex of a process we are trying to do here, what we really are asking is to build a protein from scratch that we can launch from mRNA and have this protein really interact at the molecular level with not just three other proteins, but come together and form a 60-mer, up to 60-mer of these proteins in this beautiful, amazing nanoparticle array. To walk you through some of these details, I'm going to hand it over to Cheng so that she can tell you about the amazing advances that they've done.
Thank you, Lexi. Hello, everyone. I'm Cheng, a Research Engineer at InstaDeep. Now let's see how can we build a nanoparticle step by step. Just as you can see in the video, it's like building a house. We start by designing some small pieces of building blocks. In our case, they are the trimers, which is an assembly of three identical proteins. Using generative AI models, we can design thousands of de novo trimers, as you can see here, all with different sizes and shapes. These trimers will form the basis of building blocks to build our nanoparticles. Now we've built our building blocks. How do we construct a nanoparticle exactly? Just as houses have their architecture, nanoparticles will have their symmetries.
As you can see here on the left, they can be a tetrahedron, which consists of four trimers, or in the middle, an octahedron, which consists of eight trimers. Even on the right, you can see the biggest one, an icosahedron, consisting of 20 trimers. All these previously generated building blocks can be computationally assembled to these user-defined various shapes. This leads to thousands of symmetric nanoparticle assemblies. Until now, we've only designed the 3D structure of the nanoparticles. In order to make a house habitable, you need to add the cement to consolidate the structure. In the case of protein design, we will need to design the amino acid sequences to make the protein really functional and really form the desired shape. In this case, we use AI models to generate hundreds of amino acid sequences per nanoparticle, which are supposed to really form the desired structure.
Now, we've generated hundreds of thousands of nanoparticles. It's extremely challenging for these small pieces of proteins to find themselves and really assemble exactly as what we want. To confirm this, we will need laboratory testing. It's usually time-consuming and very limited by capacity. The question is how can we select the most promising candidates so that we can test them more efficiently and achieve a higher success rate? Here comes InstaDeep's solution, DeepChain Folding Studio. It integrates the state-of-the-art protein folding models and allows large-scale screening within a short amount of time. Just to give you an idea, we can screen 10,000 designs within one day. Now Lexi will show you how these narrowed-down, high-quality designs perform in in vitro testing.
Thanks, Cheng. This is the moment of truth for a biologist, to go into the lab and see, have we actually designed these proteins to structurally form what we want them to do? What I have the pleasure of sharing with you today is that yes, we can do this. We can build these nanoparticles, as you can see the models on the top of the screen, a variety of different shapes and sizes. We can go into the lab and utilize an electron microscope to see that yes, we are able to build these nanoparticles as Cheng and her team have designed. We didn't decide to stop there. What we're really interested in is functionalizing these nanoparticles and placing antigens of interest onto the surface.
We took it this step farther, and we can also show that we can place antigens on the surface of these nanoparticles, and they still can structurally come together as designed and intended, as again shown by these electron micrographs. This is really an amazing feat of AI-assisted de novo protein design and structural biology coming together for enhanced vaccines. Thank you so much, and I'll hand it back to Karim.
Thanks a lot, Lexi and Cheng. I don't know about you, but for me, this is really magic to think that you can design a protein sequence just purely with AI and have it self-assemble in a trimer and then self-assemble again at larger motifs, and then potentially have this as a scaffold of use to put antigens and trigger immune responses. That's a significant challenge that we managed to overcome in this project. The applications don't stop here. For our last but not least presentation, we're going to show you amazing work done into the other side of developing an immune response, which is, could you actually fit a particular design, particular TCRs for a given antigen target? This is what Mike and Antoine are going to tell us about.
Great. Thank you, Karim. My name is Mike. Nice to meet you all. I'll be presenting today with my InstaDeep colleague, Antoine, on our work on T cell receptors, also known as TCRs, and specifically how to make these into as strong binders as possible, something that we think is critical to unlocking their full therapeutic potential. Why focus on TCRs? One reason is that TCRs can unlock antigens that are otherwise not available with conventional antibody-based therapies, such as ADCs. The reason for this is that antibodies need to target things that are on the cell membrane. The limiting factor here is that membrane targets, by and large, are not usually cleanly tumor specific. This means that there's some residual expression in normal tissues, which limits the dose. TCRs, on the other hand, recognize antigen in a completely different way. This is something that Ugur actually mentioned earlier.
We have a process called MHC presentation, where proteins inside cells, the whole protein is subject to this. They're digested into peptides. It's at the end of the life cycle, and they're sent to the cell surface on a molecule called MHC. That's what TCRs can recognize. What's essential here is that's something T cells can see basically the whole proteome, not just the component that's on the cell surface, a membrane protein. Because of this, this unlocks antigens that are some of the highest quality cancer antigens we know of, like oncoviruses, cancer mutations, new antigens, as well as genes that are expressed due to dysregulated gene expression in cancer. The other reason we really like T cells and TCRs is that we believe that they're likely critical to getting durable responses in cancer.
Probably the best example of this is checkpoint blockade, where we now have data showing how durable these responses can be. This is data from nivolumab and non-small cell lung cancer, showing that five years out, we had this tremendous divide between patients who got the nivolumab versus chemotherapy. More recently, we have data from TCR-T. TCR-T is a cell therapy where patients' cells are engineered to express a cancer-specific TCR. The data we've seen so far is that these can also have very durable responses. This is a TCR against a cancer antigen called PRAME. Our thesis here is that likely to get the most durable effect with cancer therapy, we likely want to be bringing T cells into the fight. There is a challenge with T cells, which is that their natural binding affinity to their targets is actually quite weak. It's in the micromolar range.
This is OK for their day job, which is going after viruses and bacteria, which are very highly expressed. When we want to go into cancer, where the antigens are more typically weakly expressed or variably expressed, we need these to be very strong binders. To be in the TCR- T cell space, the cell therapy space, we probably need our binders to be nanomolar binders. To get that, we either need to be very lucky and find the very rare natural T cells that can bind at that very strong level, or we need to do some sequence engineering to make these into stronger binders. If we want to go with an off-the-shelf biologic, avoid cell therapy, we need probably an even stronger binder, something in the picomolar range.
That is going to be a million-fold increase in binding over what you would typically see with a natural TCR. That is a huge increase. You're not going to get there ever with a natural TCR. It's probably going to take 10- 15 mutations. That is a serious mathematical engineering problem to solve. One thing that we've realized now after several runs is that we need to have a really strong computational process. The standard approach to this problem is something called phage display. In phage display, it's a fully experimental process that is randomly exploring the sequence space. It's done for each of the six CDR loops, the complementary determining loops of the TCR. At the end of the day, this will typically explore about a billion sequence variants, which sounds great.
However, the true space, which I said is about 10- 15 point mutations away from the natural TCR, is 10 to the 32. That is a huge number. Even with the phage display, we're just scratching the surface of all those variations. To find a TCR that is developable, it binds strong, it binds specifically, you'd be quite lucky with just a random exploration of the sequence space. What we've developed is a new approach. We replaced the phage display with something that's AI-guided, and it's rational. It's choosing variants in this huge space of 10 to the 32, but in a way that we think is much more effective. Because of this, we are having success in finding TCRs that can check all these boxes. The computation is key.
After having done this multiple times now, what we realize is having a solid understanding of the peptide MHC TCR structure, which varies target to target, is critical. Antoine will talk about our advances on that difficult problem.
Thanks, Mike. Let's dive a little bit into the structures of TCR -pMHC complexes. The reason why we are interested in these structures is because we want to understand the physical interactions between the TCR on one side and the PMHC on the other side. When we talk about TCR-p MHC structures, we often mention the CDR loops. There are six CDR loops, three on the alpha chains, three on the beta chains. These loops are highly flexible regions of the TCR that are getting in contact with the pMHC. If you look at this left graph here, you can see that where we represented 12 structures aligned on the same pMHC, the loops tend to cluster into the same regions. That tells you that overall the docking mode is conserved.
However, if you want to know the exact shape and position of each loop, then you have a huge diversity. That's really the key problem when it comes to TCR-p MHC structure prediction. We have a good example here where we took the CDR2 beta and the MHC of two different complexes. These segments in the structure are actually fully determined by the genome. You would expect that if they share the same sequence, they would have the same interactions. It turns out it's not the case. This is why we really need to have a very accurate TCR-p MHC structure model if you want to be able to understand these interactions. Let's talk a bit about how you can model this in silico. There has been a lot of improvements in the field over the last few years, a lot of exciting work in the community.
Based on this amazing work, we've decided that we would actually build our own model. When we want to benchmark these models, we actually are interested in the accuracy of these CDR loops that I've just mentioned. Here, we've benchmarked these models on a set of completely unseen targets. You can see that our model performs better than our competitors. Our competitors are generative models. This means that a common strategy, if you want to boost their performance, is just to sample many more structures for each target. Actually, you can see that if you do this, they don't even match the performance of our internal model. You may wonder, how do we leverage this structure into our pipeline to design these TCRs? We start from the natural TCR that has a low binding affinity. We obtain the structure, and we use two different AI algorithms.
The first one is the variant sampler. It's going to propose candidate mutations. The second one is the affinity predictor. The affinity predictor is here to rank all of these mutations and help us to select them. We go to the lab, make experimental measurements of the binding affinity, and repeat this process three times until we reach the desired TCR binding affinity. The nice thing is that we don't need to actually test thousands of mutations. We can just restrict to a few hundreds. On the right side, you have an example of a very successful campaign that we had. On this graph, we are representing the dissociation constant. The lower the value, the stronger the binding affinity is. Initially, we start this with this wild type TCR around 0. After just one round, we enter into the nanomolar range.
This unlocks the first therapeutic modality, which is called TCR-T. We continue for two rounds. At round three, we reach the picomolar range, which unlocks the second therapeutic modality, a soluble TCR. The nice thing with this pipeline is that we are actually able to repeat this process. We repeated this on four different targets. On average, we had an average binding affinity enhancement of 50,000 fold. Now I'm going to show you something even nicer, which is in vivo results on an animal model. The experiment is actually quite simple. You take the subject, and you implement the tumor. Every day, you measure the volume of the tumor, and you inject the treatment. If the treatment works, the tumor should not grow. If the treatment doesn't work, then it grows. We've tested this on two different cancer targets. We have three curves here.
The gray one is our first control. This is what happens if we don't inject any treatment. The second one is the red one. This is our second control. This is what happens if you inject the wild type TCR, so without any affinity enhancement. The last one, the yellow one, is the tumor control with our enhanced TCR. You can see the results are quite striking, I think, here. We are quite happy with this pipeline. We've built this very robust pipeline. We hope that we can move forward and, in the long run, make very good progress to elicit a durable immune response for patients affected by cancer. Thanks to you.
Thank you, guys. Really, very exciting results. I'd like to congratulate the joint teams at BioNTech and InstaDeep for the results. Like Ugur mentioned, this wouldn't be possible without a lot of collaboration. Perhaps these two examples that you have seen really show you the power of combining together AI expertise, compute at scale, with lab experiment capabilities, but also the significant biotechnology expertise of our colleagues at BioNTech. This is really integral to getting results and state-of-the-art results like we've shown you today. Really, congrats to the two teams. We're going to have a Q&A with Ugur. We're going to make this part a bit more interactive. Ugur, if you'd like to join us.
Any questions from the audience?
Questions, don't hesitate. Yeah, we have one here.
Hi. I am Francisco of Genomics England. Thank you for the exciting session. I have a question about your approach to understanding tumor biology. We heard excellent insights and applications based on the understanding of the tumor genome proteome. I wonder if you can tell us about your strategy to also incorporate those interactions between the tumor and its microenvironment and how that can lead to new treatments.
Yeah, this is an excellent question. Of course, in the tumor, there is much more information than the genomic information. We do also transcriptomic analysis. The transcriptomic analysis gives us, of course, an understanding, for example, whether T cells infiltrate the tumor, the activation status of the T cells. We can also decipher more or less all types of cells that are infiltrating the tumor. In cancer immunotherapy, there is, there are at the moment, this categorization, the simple categorization of tumors into PD-L1 positive tumors, PD-L1 high positive, low positive tumors. At the moment, the pharmaceutical industry is running just with a single parameter. Yeah. We see that the information in the tumor is much, much bigger. We can also see evolutionary processes happening in the tumor. For example, we have found tumors where beta-2 microglobulin, which is the key molecule for presentation of these epitopes, is lost.
There is much more information. This will come over time. Yeah, we will make use of the information to see how the battle between tumor and the immune system is going on.
Hello. I have a question from the webcast here. This is from Jane Han from TD Cowen. Where are you mining data from to train your models? Secondarily, how do you incorporate data generated in-house at BioNTech, either clinically or from clinical trials, to help train and improve the models?
Perhaps I can say a few words on the part about data mining and Ugur can mention on the clinical side. What we've been trying to do and in some cases managed to do quite well is collect all the available open source data that we can get our hands on. This is what we did, for example, for the Nucleotide Transformer series and NT v3. We reached 15 trillion nucleotides in total in terms of data to train on. That's one part. Then working collaboratively with all the different BioNTech teams to have specific data adding to that mix. I want to mention also that data is not all of the same type. We start with pre-training at very large scale. The value of data that is specific to a given experiment, if you think about, for example, what we have shown on T cell receptor affinity maturation.
Lab data developed in collaboration with the different teams is essential here. That makes also a huge difference in terms of getting results. Get as much data as possible externally and add specific but very high-quality data internally. We're working more and more with the clinical teams also to unlock these capabilities. I don't know if you wanted to say a few words on that, Ugur.
Yeah. It is indeed we are, with regard to the clinical data, we are not yet in the space of big data. We are in the space of deep data. With deep data, I mean really the multidimensional information that we can get for patients, including, for example, the image from the histology. Even with a few hundred patients and without using now sophisticated AI systems, but more or less unbiased statistical testing, we are able to see amazing, amazing correlations between survival and biomarker data.
Ugur, you told me sometimes even looking at data manually with your expertise, you could see patterns.
Yeah. This is actually, so one good friend of mine is using the term AI for actual intelligence. We can really benefit from them until we have much more data. Then actual intelligence could become artificial intelligence. This is the way, at the end of the day, the learning systems are really based on our human knowledge, where we need to pay attention. Later on, the power of the systems, of the AI systems, is to go beyond that and see patterns that we as humans can't see.
Yeah. We see actually even an opportunity to learn directly from the expertise of the BioNTech experts. If you think about processes like RLHF, like RL from human feedback, it's really about that. Modern systems could actually learn directly interacting with the different scientists. This is one of the projects we're working on. There is a lot more to uncover.
Hi there. Michael Pye from Baillie Gifford. It sounds like a lot of the advances you've made in models, so NT v3, the antibody discovery design model, these sound like intuitively to me as a non-specialist that could be very valuable outside BioNTech. How do you strike a balance between advancing the field and candidly monetizing the work that you've done in this space?
I would say, first, we try to bring all this innovation to our DeepChain platform. You've seen some live demos of it today. I would say in terms of priorities, clearly our number one priority is to work with our BioNTech colleagues to progress the different projects we have in oncology. Biology is extremely vast, so there are lots of opportunities to develop collaborations with external partners when these do make sense for BioNTech Group, which is very often the case. Not everybody is focused on the same problems. Like you said, those models are quite generic if you understand antibody structure, different properties, that has a broad application. I would say even broader for a Nucleotide Transformer. To give you an example, we've built partnerships even in plant genomics. Yes, there's a broad appeal.
Our first mission is to support our BioNTech colleagues, and when there is a chance to build win-win partnerships, we will take it.
Thank you very much. A very quick follow-on, if I may, for Ugur. Can you help us to understand what can you do with nanoparticle mRNA that you cannot do with today's mRNA constructs?
I think one of the key aspects is the duration of B cell responses. We know that mRNA can induce really high antibody titers, but we also know that the titers are dropping with the half-life of antibodies in the range of 21- 30 days. With nanoparticles, we hope to see more stable antibody titers because these nanoparticles remain in the body for a pretty long time.
Thank you very much.
Maybe another one from the webcast then. Agentic systems are the new frontier for AI models, such as Gemini and ChatGPT, just to name a couple. When do you think bio-AI agents will become viable and/or useful to us?
I mean, they're already useful. I think, as you've seen, we're very excited about the applications. Some of these are already coming into fruition. It is true that we're going to see much more in coming years. Perhaps a limitation that we have with the current systems is the fact that if you look at models like ChatGPT, Gemini, Grok, and others, they're really focused on learning and training on the internet as a whole. When they understand biology, they understand it from reading articles or web pages. They do not, for most of the time, understand biological sequences themselves. When you bring in deep understanding of a biological sequence at the nucleotide level, you see really magic being uncovered. We had that example with our Chat NT work that made the cover of Nature Machine Intelligence.
This was really bringing an expert nucleotide biological sequence model with a general purpose language model. That showed a lot of promise. I think that's the frontier. In this particular case, the team was able to push the frontier forward. We're going to see a lot more from that. The future of agentic AI systems is our system that really understands the biological data in multiple modalities, but also can read scientific literature in real time to be able to provide perspective and almost generate novel ideas. I remember, Ugur, you had given us this thought a few years ago. This is starting to become true. The systems are capable of formulating scientific hypotheses. We see this coming and becoming more and more frequent. If I had to say, what's the year where you see this really starting to play at scale? I would say probably 2026, so next year.
Great. Maybe one final question from the webcast. Given the broad potential for AI, both at BioNTech and in general, which technologies, modalities, and applications do you think will be prioritized at BioNTech specifically to deploy our AI tools or our AI capabilities?
The way how we develop BioNTech is really solving one big problem, how to improve cure for cancers. This is not a single application. It is, as you have seen today, really a series of modular tasks that, if combined, provide powerful capabilities to develop novel antibodies, better ADCs, better mRNA therapeutics, better vaccines. In this setting, I think one of the presenters said, having a system which is universally aware of all these goals has advantages as compared to a very specialized model. We have to see how this evolves. We are very confident that this approach, this holistic approach, understanding immunity, understanding cancer, supporting development of therapeutics in a broader scale is the way how future pharmaceutical companies should be built.
I think this was our last question. Thanks again for everybody who attended, either in person or online. It was a pleasure. Stay tuned for more progress. Thanks a lot.