All right. Good morning, everyone, and thank you for joining, and welcome to our webinar this morning. My name is Graciela Pieras. I am Senior Technology Strategy Manager at Thermo Fisher Scientific, and I will be hosting the webinar today. We're very excited to talk today about key driver identification: achieving consistent performance in bioprocessing through predictive modeling.
We are live here, and you will be able to ask questions throughout the presentation by clicking on the Q and A link at the bottom right of your screen. So please answer your questions throughout the presentation. And at the end, we will be answering your questions. So let's get started. Next slide.
So at Thermo Fisher Scientific, we take pride in our mission by enabling our customers to make the world healthier, cleaner and safer. Next. And we're committed to the advancement of science by offering products and solutions that enable customers to push the boundaries of innovation to our deep customer focus, unmatched and great portfolio, industry leading scale and our depth of capabilities. So it is my great pleasure today to introduce our speaker, Doctor. Neel Sengupta, who is a staff scientist at Thermo Fisher Scientific.
Neil completed his PhD in Chemical Engineering at Purdue University, where he studied the role of cellular metabolism in protein expression production in Choteaus. He also holds a master degree from IIT in Bombay with applications of mathematical modeling to cell signaling pathways. And he's been leading research to understand the impact of media and media components such as components of media and supplements on biological system through predictive modeling. And he leveraged this experience and knowledge to actually develop the key driver identification approach. And he will give you an overview today.
So it's my great pleasure to introduce Neil. Over to you, Neil.
Thanks, Prasayela, for such a nice introduction. So welcome everyone to the talk today. And the agenda today would be, 1st, we'll go over bioprocessing drivers and sources of variability with particular focus on media and supplements. Next, I will introduce what our key drivers and our unique approach, the KPI or key driver identification approach to identify such sources of variability in media and supplements. Next, we will go over mathematical modeling or the mechanisms that we use to identify these key drivers.
And finally, we will see some case studies where we have applied successfully this approach to achieve consistency in bioproduction and improve customer processes. And finally, we'll end with some conclusions. So moving on to bioprocessing drivers and sources of variability. Broadly speaking, the things which impact bioproduction can be divided into 3 categories, the cell line itself. So there can be specialized expression system, glycoengineering, which can impact your both titers and protein quality.
The other factors are process parameters such as the pH, DO, cell culture conditions, essentially, which will also greatly impact your bioproduction outcome. And finally, one of the things which are very critical for a successful bioproduction is the cell culture media and supplements. And as we all know, suboptimal formulations can impact product quality. And moreover, sometimes these media can be cell line specific as well. So there is not one universal media that will walk across.
So you might have special media for different cell scenarios. So our focus today on the talk will be on cell culture media and supplements and how the understanding or the deep understanding through our KPI approach can be used to leverage various favorable outcomes. So before we move forward, what I want to do is introduce the evolution of culture, media and supplements over the years. So pre-1990s, in mammalian cell systems, serum supplementation was still being used. It's still being used today, but mostly for vaccine applications.
But as we all know, serum has some issues with lot to lot variability, supply issues, risk of BSEs, etcetera. And back then, supplements such as peptones, which are complex hydrolysates such as yeast extract, soy peptones or other casein or animal origin peptones were mostly being used for microbial cell culture. So in 1990s, there was a push to move away from serum for mammalian, especially true cell antibody applications. And many of the blockbuster drugs on the market today were very successful because they were using peptones in their formulations. The peptones were instrumental in giving a rapid solution to delivering high titers.
So in mid-2000s, what was the focus was on consistency. The peptones, though they have a lot of advantages, they can lead to some sort of lot to lot variability. So there was push for chemically defined media. And today, the focus is on product quality as well. And there is also a rekindled interest in peptones as supplements, particularly in biosimilars market as well because peptones can deliver you a rapid solution.
CD media is also very useful And what people have realized that chemically defined might not be chemically pure. So they also have their own unique challenges. So that in the space today, both peptones and chemically defined media and supplements are being used to get to your targets. And they all have their unique challenges, which we are addressing in the stock today. So before I move forward, I just wanted to focus on why is media so important and just a basic understanding why media and supplements could be so important.
So here's an example where it's showing CHO DHFR cell line, which was evaluated in 3 chemically defined media. And you can see the bars here define the growth and these red dots are the production. So this is the same cell line, same process and just changing out the media formulations can have a big impact on your bioproduction outcomes. So here MEDIA3 is showing highest production, highest growth. So further looking into the protein quality, what was found that Media3, which is shown here on the plot on the right, has the lowest glycosylation.
So in certain scenarios, yes, you may be getting highest production, but is that suitable for your needs? So the main take home point is here for the slide is just by changing the formulation of the media, you can have very drastic outcome. So this just shows the power of the media to get to your desired outcome. So going back to what could be potential sources of variability in media and in particular chemically defined media, one source of variability which people have reported in literature is the raw material purity. So again, going back to the concept chemically defined might not be chemically pure.
And it's well known that manganese salts, manganese can be a trace contaminant in various vitamins, amino acids, or even other trace metal salts. So here's a case where a customer was using a peptone containing media and we evaluated this particular thermofacial peptones for manganese content. And these are different lots of this particular peptone. And you can see that the levels of manganese in this particular peptone was very small, about 0.2 ppm and the variation was also not that much. It's like 0.2 in one instance, I passed 0.3 ppm.
So is that enough to cause variation? So when we compare this peptone to a vitamin which can be present in commonly a commonly used vitamin which is present in the base media, what we found was this particular vitamin was bringing much higher order of magnitudes of manganese into the formulation. So this is an interesting scenario like we often might think that peptide can be a source of variability. But overall, we have to have a holistic view of the media in combination with the base media. So in certain instances, impurities in the basal media, which is chemically defined, can overshadow a natural variation between lots of peptones.
So it's important to keep an eye on that. And to further add on to that, each process would be unique. So it's important to find the source of variability that for each particular process. So whether if you're using a peptone, is it coming from the peptone, is it from the base media components? So that's the holistic approach has to be taken.
So if people who are using peptones, just to introduce what peptones are, Peptones are digest of protein sources of animal or animal origin free based materials such as yeast extracts, soypeptones, so on. They do offer many advantages in development of a bioprocess. So as I've mentioned before, many of the blockbuster drugs are using peptone based processes. A lot of new processes are still also using peptones because they have advantages of enhanced production, protein quality, better viability. But one disadvantage is peptones are derived from materials of biological origin and they can show some inherent biological variability.
So in our experience, this biological variation can vary between 10% to 20% and that might or might not impact a customer's process. So here at Thermo Fisher, we have strict quality controls for critical manufacturing steps, which are the key to limit the variability in peptones. So what I'm showing here is a plot of a digestion pattern of a thermofacial peptone. So this has different kilodaltons molecular weight profile. And what you can see is each bar is a different lot of the same peptone.
Having strict quality controls, we can have very consistent outcome from a peptone digestion process. So going further into a case study or scenario whether a peptone and a combination of base media will always cause a variation. So let's take an example here and where we can assume, let's say, this component A is a critical driver for some processes. And again, assumption being the critical range is it has to be above 20 PPM. So what I'm showing here is different lots of the speptone, which has some variation on this particular component A.
You can see it varies from about 12 PPM to all the way to 20 PPM. So these are very tight ranges to begin with, about 10 PPM ranges, but is that enough to always cause variation? And the answer is, it depends. So let's take up one process example where let's say one customer is using a base media, which also has this component A at 5 ppm. So in this scenario, this natural radiation coming from the peptone in combination with the base media might have an impact on your performance, especially if it's a critical factor for your process.
The other scenario could be where second customer or second process has a base media, which also contains this component A, but at 100 PPM. So in this scenario, this natural variation of this component A would not have any impact on the process because the base media would overshadow the variation from this peptone. So one critical thing to keep in mind is, again, each process is unique and any impact of lot to lot variation either from the peptone or from the CD media would be dependent on the characteristics of that particular process. And we have to evaluate everything with a holistic approach here. So in this scenario, for the process 1, component A would be something which we might say is a key driver for that particular process.
So moving on to what is a key driver and what is our unique KDI approach for identification of these key drivers. So key driver is a media component, which has a strong positive or a negative influence on the performance. There is an optimal range to achieve a target performance and variation in this optimal range is causing variability in your cell culture performance. So for example, let's take a component A, which has a variation as shown by here on this x axis. And as you can see, the variation in this component, going back to the first the example of scenario B with the component A, we can see that this variation is not having an impact on the yield.
So this yield could be product quality, this could be tighter, it could be whatever by production outcome the customer is interested in. So in this scenario, component A is not a key driver. On the other hand, you can look at component B example here, where this variation in this component range is causing a drastic impact on the yield of on the yield in the scenario. Moreover, there is a narrow range shown by this gray bar, which the component B must be in to have a suitable target performance. So in this scenario, component B would be a critical driver or key driver component.
So what is our key driver identification approach? So it's a holistic approach where we leverage analytical data. And through our proprietary biostatistical models, we identify few key factors in your media or supplements out of many, which are driving your bioproduction outcomes. So the reason we are seeing this is holistic, it's not only leveraging chemistries and mathematical models, we also apply the biological knowledge for the system to discern what is the potential key driver for your process and what are the advantages. So once we know that what's driving your what are the components which are driving your process, we can use that, leverage that to achieve consistency.
We can improve our existing process and we can even use this knowledge in scenarios where we are doing de novo media and supplement development for various projects. So once we haven't seen the approach, how do we identify the key drivers? So in general, it's the proprietary customizable biostatistical models are the workhorses of the KDI program. And it's a phase gated approach where we often work in collaboration with the customers to deliver these favorable outcomes. So the first step is data generation phase where we have some lots of media or supplements where the customer has some performance data and we generate analytical characterization on those media lots or supplements.
So the next phase or the first phase is where we start this modeling process, where we do model development and discovery. And here we using our initial data analysis, our frameworks framework for our mathematical models, we start developing mathematical models, which tie the performance and the chemistry data together. So the output of the first phase is we identify a list of potential key drivers out of these many media components and also we developed these initial predictive models. So the next phase is where we starting experimentation in collaboration with the customer where we create some pilot materials with enhanced some of these potential key drivers. And once they're experimentally tested out, that gives further confirmation on which are the strongest out of this potential list identified.
It further helps with readjusting the models and the outputs would be confirmation of key drivers and updated models. And the Phase 3 is model finalization and validation, where we challenge the model with unseen lots and ask us to predict the outcome. And sometimes we check the validation and it depends on the customer whether they want to go through this. But regardless, there is a validated predictive model, which can be used for further optimization or screening. And finally, the implementation, which where once we know what's driving your process through our micro addition strategies or other adjusted component concentrations, what we can do is achieve optimal concentration of these key drivers in the media and supplements to drive very good bioproduction outcomes.
And at the minimum, using the predictive models, we can also do raw material screening to again maintain biological sorry, by production consistencies. So the first step is going back to its data generation and especially the analytical data generation, which is done by Thermo Fisher. So we here at Thermo Fisher have a very broad, very powerful analytical group. And for many of these scenarios, what we focus on is the small molecule analysis. So what I'm showing here is different media or lots of peptones are characterized by into using various small molecule analysis.
So there can be amino acids, vitamins, nucleosides, polyamines, carbohydrates, inorganic elements and other total carbohydrate quantifications, so on and so forth. So once we generate this large data set, what's important to understand is we have still 100 plus about analytical characteristics defining these media or peptone lots, but we don't know which are the key drivers for the customer's process. So in the past, what we have done is we often and the simplest analysis would be looking one variable at a time, which might not be always useful for finding these complex correlations, because in our experience, what we have found is often 2 or 3 factors can be acting together and there are complex underlying interactions as well between these factors. So what we have done is we have developed proprietary mathematical modeling approaches to whittle down this 100 plus variables into critical few and that can be leveraged to get the desired biocreduction outcome. So moving on to how are we implementing these modeling strategies for identification of these key drivers.
So the first step is, as I mentioned before, we generate analytical data on multiple media peptide lots. And we have some e low product quality data from the customer on the same lot. So usually they can be 10 to 15 lots or slightly less to begin with depending on the scenario. So for the first step is we sorry, the next step is we build multiple competing biostatistical models. And the unique thing about these models is through our codes, what we have done is we mimic biological behavior and I will go into slightly more details in next couple of slides.
And the other unique thing is through our proprietary codes, what we have done is we from starting from a large data set, we can reduce the models to potential top drivers. So from 100 ish to it can be 10. And the other thing is these models are predictive in nature. So as with any modeling activity, these could be a little bit iterative in the beginning. So as base sorry, depending on scenario, there can be some rounds of further experimentation and data generation to make these models more robust.
So once these initial models are built, we challenge them for both predictability and accuracy. So predictability refers to the models get challenged by blinded data sets and we ask them to predict. And if there is an agreement, so that model can be further carried forward. So the model undergoes through some evolution because increasing data sets can impact the evolution in the mathematical structures as well as evolution in model parameters. So it essentially means like what component it's assigning more importance to.
So once this exercise is completed, what we end up with potential key drivers list, so starting from 100 ish to about 10, 12. And we also get the preliminary models, which can be further evaluated down with experimentation. So as I mentioned before, the one of the key unique things is the biosemimetic natures of these models. So we have used unique modeling strategies to mimic biological behavior. So at the simplest, you might have something like additive, which will you will find often in like standard statistical softwares.
But through our proprietary codes, what we can also do is mimic other biological like behaviors like which is shown here, enzyme kinetics, sigmoidals, other switch type responses. And then there are more complex strategies to define more complex biological interactions. So essentially these equations, what they are doing is they are tying the relationship between the performance and the analytical chemistry data. So once we build these initial models, each of these models are subjected to a variable reduction, where we in step 1, we start with each model started with full on 100 plus variables and through various iterative steps, at each step, statistical test is done to evaluate performance on significance of all coefficients on the models and we eliminate one variable at a time. So this gets repeated in a loop and a code and the process is stopped when all variables are significant or a smaller list of significant variables are obtained.
So from 100, we end up with 10 and this again differentiates our models where we have capability from going from very large data sets to critical few, which can be worked upon further. So here's an example of where we are comparing 2 competing models. So what's shown here is an as the simulated E from the model. So for both the model 1 and model 2, what we can see is that the R square values, which are measure of goodness of fit, give a very good fit values. But when we challenge these models for blinded predictions, which is shown on the plots on your right here, so the blue bar here is the experimental yield and the red bar is the blind prediction from the model, meaning that the model was not built using these data sets.
What we found was that additive model predictions were not in good alignment with the experimental outcome. So in this scenario, the semi log model predictions were better. And even though fits were comparable, the semi log model was explaining the biological behavior better for this particular project. So just to summarize the overall key driver identification approach here. So we start with large data sets where we are trying to identify components that show an impact on performance across lots of media.
We are looking at the components together and the goal is to reduce the list to few key drivers using our proprietary biostatistical models. So here, we again start with 100 plus. So the next stage is where we differentiate between parameters that cause the variability versus the ones which correlate to the variability experimentation. So meaning that the Phase 2 where once we start with 100, we have a potential list of 10. And then we go for experimentation to whittle out these 10 drivers further.
And the final outcome is we reduced the sets to about critical 2 to 3 drivers and we determine the key driver optimal range to achieve the optimal performance through and we also end up with a predictive model, which will be very useful for any scenario. So some examples could be for a copper in a particular process, the range was defined to be 1 ppm to 2.5 ppm. Vitamin has to be in a range for from 0.5 grams per 100 grams or 0.5% to 1%. So you can see how tight these ranges are. And these ranges can have a big impact on a customer's process.
So next, I will go over some case studies where we have successfully implemented this approach for a favorable outcome. So the first study is for a mammalian system using a peptone containing process. So the process was using an animal free peptone to produce a monoclonal antibody therapeutic. So the goal is to identify specific drivers in the peptone to enhance the yield and reduce the production variability. So using the framework which was highlighted in the previous sections, applying all the chemistry, the modeling, we were able to hone in to 2 potential key driver, 12 for this particular customer scenario.
So additionally, I want to highlight that what we also found was this particular behavior was pretty nonlinear. And you can see this particular green plot was explaining the behavior for this particular key driver 1 and 2. And the surface plot below shows a simulated response where both key driver 2 and 1 were acting together and both have to be in optimum levels to give very good production. And this again highlights like the interaction, not only the interaction between the two drivers, but also the nonlinear response we were observing. So once these potential drivers were identified, the next step was experimentation, where we had created some material where these key driver 12 were enhanced in different pilot materials and they were experimentally tested.
So the plot here on the right, which shows the blue bar, is the base material and the outcome on the base material and the red bar is outcome from the enhanced material where we supplemented with the ski driver 1 and for the lot 2 which was ski driver 2. And both the supplementation showed an improvement in performance. So we were able to increase the yield over by 40% to 50%, hence confirming these were the key drivers. And this outcome was also used to finalize the predictive model. So the 3rd phase was the model was verified further as a screening tool in locked selection.
So the final locked model was used for selection of lots. And what you see here, the plot in the middle shows, the green bars are the blinded predictions from the model. The red bars are the experimental outcome and the red dotted line is the desired yield from the customer. So using this modeling strategy, we were able to screen lots, which would be suitable for the customer and we got a very good success rate. So essentially, prior to implementing this KDI approach, if a lot was randomly selected, we would get about 40% to 50% success rate.
But using this knowledge and this up modeling approach, KDI approach, we were successful 100%. So that was a big improvement. And not only that, the customer was seeing consistent performance. So, moving on to next example. Again, this is a peptone example, but this shows an example for negative drivers.
So just to remind everyone, drivers could be positive and negative. So the background is this process was using 2 peptones in a pair. And the goal was again to identify specific drivers of the peptone pairs to enhance yield and reduce production variability. So what we found was, these cations, cobalt, nickel, copper were negative for the production And the strategy was to screen out peptone pairs. So using this modeling strategy, again, a random selection of lots would have led to about 20%, 25% of success, whereas model based selection led to 100% success rate.
And the plot below where we show peptone payer performance, the red line is the expected customer performance and the blue is the performance we are getting from selected payers, which was done using this modeling approach. And you can see that we again got very good success and consistent process for the customer using our knowledge as well as this KDI approach. So the final example, which I would go over is for chemically defined media. So the previous two examples were showing peptide media. So what I'm showing here is in house screen of chemically defined media.
So we have 42 in house chemically defined library media, which was screened for Achoo DHFR line. What's shown here is the green bars are the protein quality as presented by percentage total glycosylation and the red bars are the production values. So as expected and as we have gone before that changing the formulation can lead to various bioproduction outcomes and selection of your optimal media will require consideration on both production and galactosylation profile. So this was a scenario, which was more early stage. So this just highlights like, okay, you have to look into both production and gyrotofoliation to decide which media is best for you during screening processes itself.
So although we understand that, okay, different media will have different outcomes, we also saw one interesting trend that as the production was increasing, the corresponding total gas oscillation was decreasing. So what we still don't know is what's driving this behavior. So what we did was we applied the modeling approach to this screening data set to understand further impact of various media components contained in these 42 media formulations to understand better what's the impact on the production and galactosylation. So what I'm showing here is using our modeling approach, we were able to bifurcate or classify each media component. So each blue dot here represents a unique, let's say, amino acid or a face metal or a vitamin, which can be found in the media into 4 different quadrants.
So the 1st quadrant here are components which are positively correlated to production in galactophylation. And this quadrant is where the production have a positive impact on production, but there's a negative impact on garter oscillation. So as you can see, which this explains the previous data set a lot, there's a lot of components which have a very good impact. They are positive for production, but they are decreasing the Galactosilization, which we were experiencing in that data set screen. So as a proof of concept, what we did was we selected a Group 1 components, which were having a positive impact on production, but less impact on negative impact on data isolation.
And the proof of concept was, okay, if we test those group 1 components, can we enhance the production in a particular scenario without having a negative impact on the galactophilization? So we tested this Group 1 components over a condition 42, which was a low production conduct sorry, low production condition. And the concept was like, okay, I want to increase the production, but not have a drastic negative impact on So what we what I'm showing here are the impact of additional this Groupon component. These were added as like a bolus. The bars again represent the viable cell densities and these diamonds are the production.
As you can see, as increasing this Group 1 components, supplementing this to condition 42, we saw a 40% increase in production. Further evaluating the calatophylation data showed that we had a minimal impact on Galpafilation. So this highlights that these type of modeling approaches can be also used for media development and also be applied to chemically defined media scenarios, where just knowing what's driving your process in a chemically defined media scenario as well can help with further optimization. So, in conclusion, variability can be caused by very small changes in specific components and peptones or impurities in chemically defined media. We have developed a unique approach to elucidate the performance drivers from this large data set through our mathematical modeling approaches.
With the understanding of the drivers for your process, media can be leveraged to achieve your desired bioproduction goals. And as we have shown in these case studies, once we know once you know and understand your media, we can help with achieving a consistent process or even further optimize process. With that, I would like to acknowledge our various teams, math modeling and product support team, the chemistry team who are very instrumental in generating large data sets and our cell culture team, which was also helpful in generating some of the growth studies, which were shown here. With that, I would shift over to Graciela for questions and further conclusion of the presentation.
Thank you so much, Neil, for this very detailed and very interesting presentation on our key driver identification approach. This approach is part of our bioprocessing analytics capabilities for early stage solutions, which allows us to provide media development and analytics as well as upstream and downstream process development, cell line development as well as media manufacturing capabilities. So we will now open up for questions. So if you have a question, please type it in, in the Q and A, which is at the bottom right corner of your screen. And some of the questions verbally here.
So Neil, there is a question in already for you. And the question is, how do you identify the key drivers? Is it from high throughput screen on well plate? And I think you should be able to see the question as well, Neil.
So okay, I'm not seeing the question. Can
you repeat it?
Yes. Can you repeat it? Thanks. Yes.
So how do you identify the key drivers? Is it from high throughput screen on well plates? So,
the key drivers is primarily identified through our mathematical marketing strategy. So essentially, for the example, which I have gone over, like let's say you have 10, 15 media lots or a screening data. If you know the composition of your media or in scenarios where we are using peptone media, so we are analytically characterize those media or supplements, then that becomes a larger data set. That gets fed into the model where we start with like everything which is defining our process and using our modeling strategy, we correlate that characterization to the performance. So the analytical data is correlated to something like SLEFA titer across different conditions.
And then, then it gets right to the model and then it whittles down out of, let's say, 100 things in your media, what are the 4, 5 things in the media, which is driving your process.
Great. Thank you, Neil. We have more questions coming in, and you should be able to see them under your Q and A, but I will read it out. So one of our attendees, thank you for the great talk. The question is, do you build models for each component separately or do you utilize multiple linear regressions, etcetera?
So let's say we are doing production as an outcome. So what it's doing is, they are inbuilt codes where we are not using linear regression, it's something else. And what it does is it looks at all the variables together, build these initial mathematical frameworks and then it goes through these variable reduction processes, like it's on the code itself, where it will whittle down the various, let's say, 100 factors to 10. So it's a slide where I showed the we start with all the variables on the model and then it goes through iterative reduction.
And you do look at all the components in the media together?
Yes. Yes. Because the assumption is everything is important and then we as the model dictates, as the outcome dictates, it will use that data set to hone out, okay, what's the most important here.
Great. Thank you, Neil. We have another question. What software do you use to generate the model?
So we have built our frameworks and MATLAB software, because what we have is the customizable nature of these models. So standard softwares will have some capabilities similar to this, but again, they won't be able you won't be able to get like what's the few drivers. And the other unique thing is we have the biological Mimitech equations in there. So we have written these proprietary codes in MATLAB.
Thank you, Neil. Another question about the modeling. Can the mathematical modeling predict beyond the typical or measure concentration of the KPI?
That's a great question. So I would say the range would be something so when we do the Phase 2, we do enhance the drivers sometimes deliberately outside what the model can see, and that's the fine tuning. But with any model fitting processes, the model will depend on what data sets it sees. So yes, I would say within 30%, 40% range, we should be it's pretty robust. But again, going beyond that might be a little bit challenging for the models.
Thank you. Great. Another question about the type of components that we analyze. So is it the typical amino acid and vitamins? Or do you look beyond that?
What are where do you focus and what components do you look at in the medium to tease out these key drivers?
Again, that's a great question. So for chemically defined media, again, we have a great, great analytical team here. We do look at amino acids, vitamins, nucleosides, trace metals, anions. They also have other small molecule analysis. Peptone containing media, again, some of these will be common, but other further enhancement can be through carbohydrate profiles, peptide profiles, fatty acids, so polyamines.
So there's a lot of big a lot of small molecule analysis that's done.
Great. We have a lot more questions. So great input here from the audience. Another one, what criteria do you use for the factor exclusion, adjusted R2, Nelo CPE?
So I cannot divulge the exact details, but essentially what I can say is like what I'm checking for is the model fit in particular analyte, if that coefficient is statistically significant or not, when I'm doing one leave out one variable analysis. And from