Praxis Precision Medicines, Inc. (PRAX)
NASDAQ: PRAX · Real-Time Price · USD
321.92
-9.07 (-2.74%)
Apr 28, 2026, 4:00 PM EDT - Market closed
← View all transcripts

Fireside Chat

Nov 24, 2025

Doug Fowle
Senior Analyst, HC Wainwright

Afternoon, everybody. Thank you for joining us. I'm Doug Fowle, Senior Analyst at H.C. Wainwright. We are thrilled today for what I think is a very unique event. We are joined by Professor Chuck McCulloch, who is a Professor of Biostatistics at UCSF and a real expert in the use of mixed models and really author of what I think many consider as the sort of definitive textbook on the subject. We are also joined by Marcio Souza, who I think most of you are familiar with, who is the CEO of Praxis. We are focusing today on the Essential Three program.

I think this is a little bit of an unconventional format, but the reason we're doing it this way is not only can we get the perspectives and insights from Chuck on some of the questions that have been coming up around Essential Three, but also sort of accelerate the sort of cycle time for addressing questions by having Marcio here. I also want to make it clear upfront that Chuck is not affiliated with Praxis in any way. He is an independent consultant. My hope is that Chuck will provide some insights that will help everybody feel sort of better educated on the program, as well as ask some tough questions that Marcio can provide some insight.

Chuck, I think as a starting point, I think it would be really helpful if you just provided sort of a two-minute refresher on why you use an MMRM model and its value for this type of clinical study. Ultimately, do you think this was the right model for them to use? Maybe if you just have an overall impression of the data as you've seen it.

Chuck McCulloch
Professor of Biostatistics, UCSF

Certainly. I mean, this is longitudinal data. We collect data repeatedly over time on the same people. From a statistical point of view, that's called correlated data, because of course, data within the same person is similar over time, more similar to their own data than to other people's data. That requires a statistical modeling method that can accommodate this repeated measures correlated data. You also need to be able to have an analysis method that allows for flexible modeling. For example, you want to adjust for baseline ADLs. In the models that Praxis used, they also adjusted for things like a family history of tremor. You have to have a flexible modeling framework. That basically boils you down to two choices. There are mixed model repeated measures analyses, MMRMs, and what are called generalized estimating equations. Name's not so important.

It's just another method for flexible modeling of correlated data. A practical reality of any clinical study is there's going to be missing data. There's going to be dropout, and that needs to be accommodated. Mixed models are well known to produce more reliable results with missing data compared to these generalized estimating equation approaches. In this study, there was a perishable dropout leading to missing data. We didn't get to measure the activities of daily living for every person for every week. That gives a clear preference to mixed models in this context. If you had come to me completely independently and said, "Help me write a statistical analysis plan," this was almost certainly the path down which I would have led. The start is using an analysis method that I would deem most appropriate.

My general reaction is I start looking at the results. The results are highly statistically significant. If I back out p-values from the confidence intervals, the p-values are really tiny. Very strong statistical evidence. This should give robustness to any violations of assumptions in the analysis methods. All that is good. There are some concerns that we'll talk about, I think, a little bit later that would have been concerns in my mind as well that need to be addressed by analysis of how robust these results are. Those are my sort of initial impressions.

Doug Fowle
Senior Analyst, HC Wainwright

Okay. Marcio, with that as a backdrop, maybe just provide an overview of the Essential Three program and sort of the key considerations in the design following the Essential One study results.

Marcio Souza
CEO, Praxis

Yeah. I think I'm going to try to build a little bit of what Chuck just mentioned, right? From the very beginning of this program, from the very first interaction with the FDA we had, there was no real other choice of model, other choice of methods in the discussion. I think that if you go across the boards and look into neurology in general, but even outside of neurology, longitudinal data is almost always analyzed using a mixed model. That may be important because I think there were some questions about when did this come to play. Even on the phase II, the phase II-B, that was always the case, right? I believe, and correct me here, Doug, if that's not where your question's coming from, when you look into the actual hypothesis that was generated to design Essential Three, right?

We didn't come up with Essential Three with the phase III program out of thin air. There was a prior study. There was a previous study that generated the hypothesis here. There were a number of key elements there. The first one was the population, right? We always define a population first. It's like, why are we studying this? Very similarly to what we're doing in Essential Three, the idea was to use a population that was reasonably severe, right? When you look into the baseline for those patients, that's pretty high in terms of the severity. It's equally, actually, a little bit more severe, arguably, on Essential Three, but very similar. That was important. They were not really treating people without significant impact on their daily living.

The second part that was incredibly important is the FDA, actually, they were ahead of us on that. In a sense, they insisted that we use the ADL modified in a way that they requested, right? That's why we call them ADL. It's the modification on the scoring after the data is assessed. And that was a request by the agency. Actually, when you talk to physicians, they normally quote the ADL because that's what they assess, but the actual measure is slightly smaller numerically and therefore harder to reach statistically significant results, as we said before. With that knowledge in mind, the program was created. You got to remember as well that when you go back to Essential One, the results on the MADL, while it was not the primary at that point in time, it was positive.

Actually, the p-value was lower than the 5% threshold that had been defined for that as well. Based on that, the Essential Three program was created. Now, we knew at that point in time we needed more patients to actually get the certainty to be higher. That is how the study was created in general. It has been very consistent, if I may, throughout the program.

Doug Fowle
Senior Analyst, HC Wainwright

I guess just really quickly, just the overall structure of Essential One, because there were two studies, right? Study One, the parallel design, as well as Study Two. On Study One, you did have you changed the primary endpoint in the study midway through. Maybe just talk through quickly the process that what led you to do that. That was this decision you made after the interim analysis, but you did not sort of make the change until September. Kind of what took so long?

Marcio Souza
CEO, Praxis

Sure, sure. I'm glad you asked those questions in sequence, right? If you look back to Essential One, that was an eight-week study. When we conducted the IDMC, conducted the interim analysis, and we decided to continue the study, we had to, I'm going to say, slow down. Slow down by asking the questions like, "What? Everything we know about the program? How do we know the drug to be effective? How fast?" All these questions. For how long do we have certainty on the estimations for the projection of the primary endpoints, right? Reasonable, I believe, questions to be asked. There was one thing we knew, that eight weeks was how we based the study. Day 56, number one. There was something that was quite important as well when you talk about the design of Essential Three.

We made the decision, I want to argue, a very high-bar decision, a very complex one, to randomize patients to Study One, to the parallel group, or to Study Two, to the stable responder randomizer draw. That is incredibly difficult to do, number one. A lot of people do not do it because you increase the bar in terms of enrolling two studies at the same time. One of the things that came from that decision was that the studies were identical for the first eight weeks. Identical in every possible way. The patient pool, because they are randomized based on the same specification factors that Chuck's just mentioned that we had, as well as covariates for the baseline. The assessments were blinded to study personnel, to patients, all the way. That equality, if I may, ended at week eight.

After week eight, the responders on Study Two got randomized to stay on drug or placebo. We changed the assessment after that. In order for us to actually even further refine the estimation, we went back and said, "Where does that end?" Week eight. It took a while, of course. One would always go back. This is a high-stakes decision. We should not take it in a rush. We took our time to think about that, to simulate, and so on. It takes time to think. It takes time to simulate, and takes time to write the change and to submit to the agency. That is why it is done this way. In the grand scheme of things, I do not actually believe that is a lot of time.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, just to confirm, right, you did submit that ahead of locking the database for the study.

Marcio Souza
CEO, Praxis

Oh, absolutely. There would not be a valid change if we had not done this without knowledge, right? There was no knowledge of the allocation on the study or the group level or the individual level, any of that before the change was memorialized, number one, implemented, because we implemented the change. We amended the protocol. We amended the FAQ. We submitted to the IND. We sent a letter to the agents, so on and so forth. Everything was done way before the database was actually locked.

Doug Fowle
Senior Analyst, HC Wainwright

I think, Marcio, it's important to remember that sort of changing the midpoint or the endpoint mid-study is not without significant precedent, right? I mean, we even saw it recently with donanemab's application where, in fact, you actually had the agency disagree, but ultimately it was not deemed sort of a consequential to the assessment of efficacy. I want to come back to Chuck in a second, but Marcio, I know you've sort of provided a lot of detail. You said that it was actually highly significant. What was the actual p-value for at day 56 on the primary endpoint?

Marcio Souza
CEO, Praxis

Yeah. I think that the reason why I didn't show this before, I'll tell you right now, but it gets to a point where it's like, "Why are we even going there?" Right? I described this before as silly, right? There are certain things that we do that they get silly. But the actual p-value was to the order of 10 minus six. If you think about that, that was after the dots there, the point was like five zeros and then the first positive integer. I think it's hard to believe that anyone would consider that even close on the pair tests. Again, you should ask Chuck why he thinks.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, you have said that you were positive on the original endpoint, right? As much as you had made the change, ultimately, in some ways, it was much ado about nothing. I mean, can you care to provide some color on how successful it was on that endpoint as well?

Marcio Souza
CEO, Praxis

Yeah. So it's worth to say there was not only at the original endpoint, right? There was the day 84 assessed as the average contrast between day 77 and 84. And that was to the 10 up minus three, so the p-value. But it was at every time point, each one of the time points assessed, including day 14, right? So day 14, day 28, day 49, 56, like 63, 77, 77, and 84. All of them were significant. I think it matters, right? We're basically saying at no point in time there was a weird fluctuation here that you lost significance. Not that it would be a problem. We've seen that in several trials before that there is some fluctuation that did not impair their ability to get approved. I think in this case, it's quite important as well to mention that.

Doug Fowle
Senior Analyst, HC Wainwright

Chuck, when you hear those degrees of sort of statistical significance, I'm curious, how does that influence your sort of initial assessment?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. Let me back up just a second. Whenever there's sort of last-minute changes in an analysis plan, especially the primary outcome, it sort of sends up signals to me that I better pay attention. I better scrutinize this a little more carefully. The same considerations that you brought up come to my mind. Was this decision made before database lock? Yes. That's important because things haven't been unblinded yet. You don't have, you're not making these decisions based on what you know the results are going to be. That's number one. I'm less swayed by the necessity of having equal time periods in the two arms of the overarching studies. You've got the data for both. You can analyze them using only the consistent data if you want to.

It's a question of what you declare to be the primary endpoint. Okay. It's been changed. Okay. Stepping back, in the bigger scheme of things, this is a pretty minor change. It's not like you changed from one metric to another metric. All you did was you shifted the time at which the primary assessment was going to be declared. I've seen people in clinicaltrials.gov just say, "The primary outcome is MADL." They don't even specify the time in clinicaltrials.gov. It's a relatively minor change. Okay. Still, there's a little suspicion here. There was a last-minute change in the primary outcome. Now I turn to the sort of issues that Marcio was talking about. In kind of worst-case scenario, let's imagine a conceptual trial where you declared those endpoints, 84 and 56, to be coprimary. What would you have done?

A relatively draconian adjustment would be to do a Bonferroni adjustment. That means instead of testing at 0.05, you test at 0.025. That lets you look at both endpoints and take either one that's statistically significant and declare success. These p-values are tiny, as Marcio was saying. Again, I back-calculated them from the confidence intervals that are public information. They are very, very small. That still leaves me pretty convinced because even if I say, "Okay, I'm going to adjust for coprimary outcomes," I've still got statistically significant results.

Doug Fowle
Senior Analyst, HC Wainwright

Chuck, that's really helpful. I think one of the things that you talked about, right, when you talked about the value of an MMRM is missing data. That has become a focal point for investors. You're sort of thinking about the various sensitivity analyses that the company's presented. Can you just give us a brief tutorial on missing data and how an MMRM model handles it and help us understand what missing at random and missing not at random actually means?

Chuck McCulloch
Professor of Biostatistics, UCSF

Sure. Again, I think we're transitioning to a different topic because I do see these as sort of separate issues, the choice of the primary outcome versus how you deal with missing data. Let me talk a little bit about mixed models and missing data. As I said earlier, mixed models are clearly preferred in situations where there's missing data. It's almost always the case in any clinical study. There is some. That's often the reason that people are guided to use these. Why is it that people like them? It's because analysis of data using a mixed model approach without any formal consideration of missing data, so just sort of pretending that, "Okay, the data is unbalanced, but there's no real bias to why we had missing data for some people," under certain circumstances still gives valid results.

That's the key reason people like that. Okay. What are these certain assumptions? These assumptions are technically known as missing at random. Now, that's a terrible term. I wish that whoever made that term popular would be strung up because if you try and parse it as an English language, it leads you to the wrong conclusion. I prefer to think of it as missing that's predictable from the observed data. In this case, that would include anything in the model, like family history of tremors, as well as any previously recorded value of ADL on that person. That then extends the protection fairly widely.

Missing not at random, that's the more problematic case because the mixed models don't necessarily protect you there, means the missingness is dependent upon other things, like the value of ADL if we got to see it, which, of course, we didn't. When the data are missing not at random, then a mixed model analysis can give you results that deviate systematically from an analysis that you would get if you suddenly magically had access to all the missing data. That is the legitimate concern. Again, it's not just a concern in this particular study. That's a concern with any study that has missing data, which, again, is virtually any clinical longitudinal study.

Doug Fowle
Senior Analyst, HC Wainwright

I want to clarify because I think this has been sort of a misconception amongst some is that a patient dropping out because they're doing poorly on an observed basis is not a violation of MAR, right, missing at random, as well as an imbalance in discontinuation between the active and placebo is also not automatically a violation of MAR.

Chuck McCulloch
Professor of Biostatistics, UCSF

Right. When people are dropping out because you've seen this, say, a patient's not getting better, and they decide to drop out of the study, that's predictable from their pattern of ADL measurements up until that point. That could well be missing at random. You're right. The mere fact that the discontinuation rates or dropout rates are different between the arms is not necessarily an indication that the missing at random assumption is violated. I'm a senior statistician for a recently completed randomized trial treating depression. One of the arms was an antidepressant drug. Our endpoint was declared to be depression after the end of treatment. We had, of course, intermediate measures of depression. People taking these pills got immediately better and discontinued. That was completely predictable from these early measurements of depression.

That led to a big difference in discontinuation rates, even though the MAR missing at random assumptions are still quite plausible.

Doug Fowle
Senior Analyst, HC Wainwright

Is it fair to say that with the model that Praxis used assumes MAR, but we do or should consider the possibility that the mechanism of missingness is missing not at random, and we need to stress test the data for that possibility as well?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. I'm not so sure I'd say it assumes that. Again, I'd go back to under missing at random assumptions, it still gives valid results. It's not guaranteed to give valid results when it's missing not at random. It's an unfortunate fact, maybe not too surprising, but you can't tell if a data process is missing at random or missing not at random by looking at the data because it depends on things you didn't get to see and assumptions about things you didn't get to see. It's almost always a good idea to stress test these missing data assumptions by doing sensitivity analyses.

Doug Fowle
Senior Analyst, HC Wainwright

What would be sort of the way that you do sort of assess the plausibility of MAR? What are some of the questions that we should be thinking about here?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. I go through this process and try and think about it in a sort of twofold. First, is the situation? Unfortunately, this is because it depends on knowable things from the data. It depends strongly on the clinical context and what you know about what's likely to cause missing data. First, I try and think through, is the scenario likely to generate data that's missing at random, or is it likely to generate missing not at random? I mean, just to give an example, in a recent study that I completed, we had a scale we were using. It was not validated nor proven to be useful in the population we were studying until the study had already started.

The first few people, the first 100 people out of many hundreds we recruited, we were not able to use this scale because it had not yet been validated. After the first 100, we decided it was the better scale. We made the hard decision to replace the one we were using. We now introduced missing data for the first 100 people recruited into this trial. You know from the context that there is a strong argument that that is missing at random because the fact that it was missing was just related to the fact we recruited them earlier. I mean, you can imagine things where it is missing not at random, but they are pretty implausible. There, the situation gives you sort of confidence that it is missing at random.

On the other hand, and probably more relevant to today's discussion, when dropout is related to adverse events and causes discontinuation in the active comparator arm, we have more suspicion that things are likely to be missing not at random. That is where it is even more important to stress test by doing sensitivity analyses.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, sort of how do you think about how did you go about addressing these issues?

Marcio Souza
CEO, Praxis

Yeah, absolutely. The first thing is like what we hypothesized before, right? The question should be always first is, what is the primary model? We talked about that already. How do we stress test the model, right? If you consider things that might not be at random, that are missing at random, how do you stress test? The pre-specified sensitivity analysis, which, by the way, was the same one on Essential One, the previous study, was the one in every statistical analysis plan here. It's actually a commonly used one for MMRMs, the tipping point. The principle there is that there is a point by which you keep removing benefits or adding a penalty, whatever you want to think about it, that would tip, right, would become non-significant.

The second judgment you have to make there is like, what is clinically plausible? Because you can't interpret just a mathematical change. You have to interpret that as clinically plausible as well. When you put the two together, the proposed and the memorialized one on the statistical analysis plan was a tipping point analysis. It starts at half a point as penalty and going to two and a half points. That was the maximum on the actual analysis plan. Subsequent to that, and as you know, it did not tip, not even close to actually becoming non-significant at that point. It's actually much larger than that, if you care to know. It is much, much larger than two and a half points, which is already pretty absurd because the patients don't get worse and so on in this case.

The other question that we asked ourselves, and you've seen this in our disclosures, is for these patients that we didn't have information, I think Jeff just talked about it, we can only hypothesize things. You can ask, what could be a reasonable replacement for their values, right? That's what all we are doing here is replacing the things we don't know. We said, okay, placebo would be a reasonable replacement for that. A reference, right, would be reasonable. We've done a different method that was very clearly not pre-specified. It's very common to be done, but it was not. The only pre-specified was the tipping point, which I can box already. We've done that.

We tested, although the methods are very similar when you complete the data, we tested using the MMRM, feeding again the MMRM now that it is no longer missing, right? You just fit the model there and in an encode as well as in very similar terms. While you've seen that once again, one could actually expect that is highly significant results on those sensitivities. You're stress testing the model or not necessarily. You made the example of donanemab. I'm going to bring that back because I actually think it's quite important. Jeff was mentioning about recent studies as well, right? That is relatively recent.

As you know, there was not only a change on endpoints, but the agents actually comment quite eloquently that it doesn't really matter because the other endpoint was actually positive as well, kind of similar to the changing time points for us. Quite interestingly, they actually requested a tipping point analysis. The agents had asked Eli Lilly, which is the sponsor here, to conduct, and actually tipped at the first level. That did not preclude at all. The first level was very low, by the way. Preclude at all the approval of the drug was just stress testing. I'm going to call it stress testing how reliable to stress the endpoint was. It happens to be that on that study, it was pretty high, the discontinuation rate as well. It is not unequal to the situation we're dealing.

It is not only that there is plenty of predatory precedents, number one, very recent as well, same personnel in the division for that matter. We also tested with other methods, and those methods are also resulting into very robust results at the end.

Doug Fowle
Senior Analyst, HC Wainwright

Jeff, we did get a question from somebody watching who did want to clarify or just, but you alluded to it that if you do have discontinuations because of adverse events, how does that impact our assumption of MAR?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. When you have discontinuation due to adverse events that you think are treatment-related, so people are discontinuing the drug, it's unrealistic to expect that they're going to follow. Let's say we have a couple of values of ADL for them already, and then they have an adverse event, and they discontinue. It's probably unreasonable to assume that they're going to follow the trajectory that they were on while they were on drug, and then they suddenly discontinued. This is the place for what Marcio was talking about, where you either say there are a couple of widely accepted methods for doing these MNAR, missing not at random analyses, one of which is tipping point, and the other is return to reference, where you say, I think they're going to switch to some other profile.

A reasonable profile might say, okay, I think they're going to switch to look like a placebo patient that otherwise had sort of similar characteristics to this person who started in the active comparator arm. That is a situation where I do think stress testing is more indicated, and a missing not at random mechanism is more likely. Again, we can never know for certain, but the clinical context here would suggest it.

Doug Fowle
Senior Analyst, HC Wainwright

Jeff, based on what you've heard so far in your assessment of the data, do you think MAR is still plausible?

Chuck McCulloch
Professor of Biostatistics, UCSF

It depends on what type of patients we're talking about. If we're talking about somebody who's got quite a few measurements of ADL leading up to the primary endpoint in 56 days and is still on drug, probably quite plausible. If we're talking about somebody who dropped out very early, we have no follow-up measurements, or somebody who discontinued drug, I think less likely.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, I think you sort of referenced it, and you did talk about the tipping point analysis that you conducted. Maybe, Chuck, it might be helpful for you to just sort of walk through how a tipping point analysis works. Marcio said that I think it was they got to half a standard deviation, two and a half points. What is your initial impression of that, right? I think the p-value at half standard deviation was 0.026. Correct me if I'm wrong, Marcio.

Chuck McCulloch
Professor of Biostatistics, UCSF

No, you're right. Yeah. Okay, let me back up and talk a little bit about how a tipping point analysis works. Again, this is in the context of data that's missing not at random. We have to make certain assumptions about how much, and in this case, worse, an active comparator patient would do when the data are missing. We're basically saying, okay, usually you start from, okay, here's what we'd expect their trajectory to be ordinarily. Now that they have generated missing data, we're going to take that expected trajectory and make it worse by some amount. Typically, these are called like a delta adjustment. We make a little drop. We say, okay, maybe we expected that patient to have improved by three points on the ADL scale. Now we're going to decrement their improvement.

They didn't improve by three points. If we have a delta of one, we're going to say, okay, we're going to just make them say they only had an improvement of two. That would be like a delta of one. We apply this value of delta, and we check to see whether or not it overturns the results, typically asking, are the results still statistically significant after I've applied this little delta? Typically, you march and you increase the delta. I would nitpick with Marcio's approach. I mean, the tipping point analysis should keep going until you tip it over and no longer get statistically significant results. They stopped at about a half of a standard deviation. I'll come back and talk about that in just a second. That's how the tipping point analysis works.

In their case, they went up to a half a standard deviation, still had statistically significant results. Basically, they had not yet reached the tipping point for this analysis. There have been some questions about, is a half a standard deviation too much, too little? A half a standard deviation is a moderate-sized effect. A two and a half point change on the scale is right around what a clinically important difference is. Even though, again, I would nitpick and say they should have gone farther until they actually tipped the analysis to be not statistically significant, they did go up to a pretty big effect size and did not see a tipping of the analysis towards not statistically significant.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, a couple of questions. First, I want to—somebody pointed out I misspoke. I think I dropped a zero.

Marcio Souza
CEO, Praxis

You did drop a zero. I was going to correct you.

Doug Fowle
Senior Analyst, HC Wainwright

Yeah. You were two and a half. You were 0.0026. Sorry.

Marcio Souza
CEO, Praxis

There's an extra zero there, but that's the start of this program, having an extra zero, so.

Doug Fowle
Senior Analyst, HC Wainwright

There are too many zeros. How did you settle on two and a half points as half a standard deviation? What was sort of the sort of rationale or identification of that? I mean, you haven't necessarily talked or fully disclosed all the standard deviations on the primary endpoint. I think we had the baseline at 2.4, but just what was that? And at what point, to sort of Chuck's question, I think that's a good one. At what point did you lose significance?

Marcio Souza
CEO, Praxis

Yeah. Number one, I agree, right? If you were to reasonably hypothesize that you're going to keep going, remember, you control studies. I know we are going through this MAR and MNAR and scrutinizing the other endpoints, but you control the study at the 5% alpha level. In a sense, you live or die on that, right? That's how you declare the study successful or not. I think all we are saying right now is, in a sense, gravy, the way I look into it. Should we have, in our craziest dream, imagined we'll go beyond two and a half points of penalty on the sensitivity? No. Did the FDA comment when we submitted two and a half as the maximum? No. They did not criticize that either. Now, in retrospect, would I have pre-specified keep going? Yeah, absolutely. There's nothing wrong with that, right?

We're talking about a 0.0026 p-value at two and a half. Of course, the number is much bigger. I would kind of leave in a sense with that. It's like we normally would say you have to cross the 5%. We're in the 0.26%. You got to imagine that the number is significantly bigger there. Do we need to be bigger is the question. The answer is no. Are we bigger? The answer is yes. Are we bigger by a lot? Yes. That should kind of close a chapter, right? This is a sensitivity. It's not even the primary. I think if the primary was 0.0026, we would be happy. We're talking about a sensitivity with penalty here. I talk about silliness. This is one of those things that got a little silly.

Doug Fowle
Senior Analyst, HC Wainwright

Chuck, maybe you could comment just from your perspective when you look at a study like this and how standard a half standard deviation penalty for a tipping analysis, right? How robust is that? Or would you have said, gee, maybe it should have been 0.75 standard deviation or a full standard deviation? I mean, I think to Marcio's point, they came up with a number, and they could clearly go past it. To people who are saying, oh, gee, why didn't they go further? From standard practice or from your perspective, how robust is half a standard deviation?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. So again, to be nitpicky, I would have preferred to just see a formal tipping analysis where you keep going until you tip. Then you have the thing in hand. You say, look, I did not tip until one whole standard deviation something. I mean, that is how these sensitivity analyses work. That said, a half a standard deviation is pretty well accepted as a moderate effect size. You are saying that you are going to decrement the active comparator arm by a moderate-sized effect. I am also convinced by the very small p-value there because I know that just as Marcio said, that means you can go further and not flip it over. I feel like I am more nitpicking than strong concerns. Yeah, I looked at it. I said, why did not they keep going?

Okay, the P-value is 0.026, and a half a standard deviation is a moderate effect size. I am not overly concerned with this.

Marcio Souza
CEO, Praxis

One could jump there, right? And kind of, again, giving the, as we are in active discussions with the FDA, I'm going to be a little bit careful as well. Does it tip at three? No. Does it tip at three and a half? No. Does it tip at four? No. At one point, it gets to the point that I said it's silly, right? Because now we're penalizing the entire study by an amount that is not reasonable on a highly heterogeneous patient population, on one that you know placebo does not do much. You got to put the clinical context on this specific analysis as well.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, to your point, if it's holding at four points, you're taking patients to well below even the placebo response at that level, right?

Marcio Souza
CEO, Praxis

Now you're damaging them. Is that reasonable to actually say that? I think that that's where it becomes unreasonable.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, the analyses have all been on the MITT population. To be in that, you needed to have a post-baseline assessment. I know you had FDA agreement and alignment that this was the primary analysis population. We have gotten questions in terms of the ITT population as well. Can you provide some perspective on sort of how that impacted your analysis?

Marcio Souza
CEO, Praxis

Yeah, no, absolutely. I would maybe separate two things here, right? One is what the ITT would do to the actual results. That is nothing because there is no post-baseline term. The MNRM would just drop that. That is not probably where your question is coming from. Your question is like, what happened to these patients that are not available? I think that is where using the jump to reference here is important, right? You're saying, okay, for those patients as well, not only the other ones, we're going to replace their value. We're going to treat them as if they're not responders. We know it's not the case, right? We know that a very large proportion of patients respond. We actually penalize them quite aggressively. We show that data for the jump to reference.

I'm going to throw another one here because since this is a Raghav Tircol, we did do another sensitivity analysis that is not in our deck, that we completely replaced their numbers by a zero change. Basically, a baseline carried forward, right? We're saying there is no change. These patients are straight line because placebo arguably has small changes in the direction of effects. That is also positive. If there is any concern about what sensitivity does to the primary, that should not be. Again, when you're talking about a 10 to the minus six primary, when you're talking about every time point being positive, it is just very, very hard to actually negate the null by just going through these different imputations, as we may.

Doug Fowle
Senior Analyst, HC Wainwright

Chuck, and how do you think about the reasonableness, if you will, of considering patients that dropped out before day 14 as missing at random since we do not have sort of an observed outcome for these patients?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. As I said before, unfortunately, we can never know for certain whether things are missing at random or missing not at random just from the observed data. We sometimes get clues. Again, as I was talking about earlier, if we've got a sequence of values and then we know that a person had an adverse event and discontinued drug, we've got sort of clinical expectation that they're going to change somehow. These are patients for whom we have virtually no information, not even clues as to what we should be assuming. This is a situation where if you wanted to go to an ITT, again, the primary is modified intention to treat. I don't get too concerned when there's an accepted modified intention to treat as the pre-planned analysis. It's always good to think about intention to treat.

This is a situation where, yes, you would definitely want to be doing some stress testing and by using one of the methods of assessing sensitivity to missing not at random data just to see what happens.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio referenced jump to reference and that being sort of one of the analyses that they did. Chuck, maybe provide sort of a quick refresher or sort of tutorial, if you will, on how that works and how that adds to the robustness of our picture.

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. It is another well-accepted method of trying out a missing not at random mechanism. Again, it is the one that corresponds in many situations to be clinically appropriate. It is just saying that if I have got a drug and I think that as soon as somebody discontinues it, they will look like a placebo patient, and I am going to choose the placebo pattern of data as my reference group, I am assuming that person immediately goes to look like a control patient. I am assuming that is the reference group that was used in this situation. That is a pretty conservative approach by saying I am assuming that instantaneously from the beginning, this person looked exactly like a placebo patient. You are immediately diluting the active comparator effect about as much as possible by assuming it looks exactly like a placebo.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, I know you have a question that came in that I think might be helpful. How do you think secondary endpoints sort of help us inform us in our overall population? I would be curious to hear Chuck's perspective on that as well.

Marcio Souza
CEO, Praxis

Yeah. Maybe first on the statistical, how they were treated, right? The primary had to be positive, and then the secondaries were sequentially tested. I think it's incredibly important. I'm glad you asked that, right? There are a number of things we assess here as the secondaries, and they're all positive. They're all very small p-values. One thing we haven't said publicly before, they're also positive at every single time point. When you consider the way we structure the secondary endpoints, right? The first one is a clinical outcome assessment, the primary. The MADL, the clinician assesses the patients. Then we ask, what happens on the overall effect? What happens on the entire trajectory of these patients, not only at one time point, but on the entire trajectory? That is to the order of like 10 to the minus seven, the P-value, right?

Overall, in this study, they're doing really well. We ask the question of how patients see from the beginning of this study, their health improving, like using the PGI, how the clinicians see, how the clinicians see the severity of the change. For all those things, if you look across the studies in general, they do not tend to be positive. They do not tend to be positive throughout on studies that are not really giving a benefit because they are not assessing the same thing, right? We consider, yes, we have been talking about sensitivity to the primary, but then go and ask the question, what happens to the overall health of these patients? It is incredibly strong as well, each one of those. I personally think it is always very good to see secondary endpoints supporting the overall effects of the drug and not being conflicting, right?

Because sometimes we see conflicting secondary endpoints in other studies. That was not the case here at all. They're all showing benefits.

Doug Fowle
Senior Analyst, HC Wainwright

Chuck, I guess the question to you, how do you sort of look at secondary endpoints sort of influencing your overall assessment of the data?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. I often work outside the non-regulatory environment. It's especially important there. Even in the regulatory environment, I don't really have much to add over Marcio. I mean, I do exactly the same thing. I look at the secondary outcomes, especially ones that I expect to be highly correlated with the primary outcome. I start seeing it as a red flag when they disagree, especially if they go in the wrong direction, which I've sometimes seen happen. Again, when everything lines up, effect sizes are in the right direction, especially when everything is statistically significant, I think that is pretty strong support for the primary analysis.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, I guess you did reference the jump to reference as having sort of being successful and obviously at a robust level. I mean, how much further past placebo were you able to take patients before the model breaks?

Marcio Souza
CEO, Praxis

Yeah. You can go back. I already said, you can replace their chains for zero on the ones that discontinued before day 14, right? Meaning you replace completely. You remove the effect by, in that case, 1.7 points on average because that's what placebo was. It's still highly significant. You can make patients worse than they were at baseline and still be significant on the missing data perspective. I think someone would have to be institutionalized if they think this drug makes patients worse. Therefore, that should be a chapter that is closed, right, in terms of how robust the primary is on this analysis.

Doug Fowle
Senior Analyst, HC Wainwright

Okay. Marcio, I think I just want to, before we talk about sort of some of the integrated data, I thought it might be helpful to just sort of do a quick summary of kind of what we've covered so far today. I have actually been working during the call to sort of pull together a little bit of a matrix. If you give me a second.

Marcio Souza
CEO, Praxis

Okay. I don't like surprise, but go for it.

Doug Fowle
Senior Analyst, HC Wainwright

Just as we go through, I thought we would just hit the primary endpoint, as you talked about, Marcio, was sort of significant. When we did the sort of all-time points MMRM analysis, we're still significant. PGI, CGI, secondary endpoints, as observed, as you just noted, significant. You go into a sensitivity analysis using missing not at random with an imputation of greater than 2.5, we're still okay. Jump to reference placebo, we're still okay. I think you referenced maybe an ANCOVA, jump to placebo, we're still okay. As well as going back earlier to the A7784, right, MMRM sort of initial analysis, we are still there. Just sort of sticking to that.

Chuck, I guess when you look at this and we do not need to hold it up for another second, but I guess, Chuck, when you are coming back to just the main thing, when you look at that matrix, what is your response or perspective when you think about sort of a program that has that body of work?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. I kind of distinguish the two issues. One is choice of the primary endpoint, which I find to be relatively minor. The strength of evidence is there, even if I go co-primary endpoints. The key to dealing with missing data is doing lots of reasonable sensitivity analyses. I am pleased that there are multiple ways in which the sensitivity analyses were approached for missing data.

Doug Fowle
Senior Analyst, HC Wainwright

I want to sort of turn to we're sort of getting close to the hour, and I want to cover the integrated analysis, and we've gotten some questions from the audience, as you can imagine.

Marcio Souza
CEO, Praxis

I can.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, we have some sort of alternative hypotheses, right, and sort of additional integrated effectiveness, I think, hypothesis three, as well as four. Maybe just quickly provide some perspectives on how that you think will sort of inform the agency's view. Chuck, maybe you could provide some perspectives on how you look at these types of analyses.

Marcio Souza
CEO, Praxis

Yeah. I'm going to start with the I never do this, but I'm going to start with the problem with comparing things like this, right? Normally what we hear is there was no control on the second study. They were not unconcurrently. They didn't use the same covariates, blah, blah, blah. It's not the case here, right? These two studies were literally stratified on the same parameters. From the beginning, we wanted to ask a question that, how consistent is the arm on study two to the one on study one since patients were unaware, right? They couldn't know which study they were in. Of course, we're not looking for exactly the same because by definition, they wouldn't exactly the same is not something we see. It's incredibly consistent, incredibly consistent. You saw the integrated analysis.

What I can tell you is what is super interesting on this study is placebo on Essential One and placebo on Essential Three are very, very similar, number one. Drug on the first arm of study one for Essential Three and on the run-in period for study two are incredibly consistent. We asked a slightly different question, was actually another brilliant statistician that suggested, I wish I could take credit for that, is like, why do not you formally compare placebo on study one to drug on study two since they were unaware, right? That is one of the hypotheses, hypothesis four. All of that is significant. Of course, when everything is significant, when you put them together, they get even more significant. We are talking about a small p-value. That P-value is insanely small, right?

The consistency of the effects, I think it's important, number one, as we're making decisions on putting patients on drug clinically. Secondly, you see again and again in FDA reviews how they talk about consistency of effect being important. Love to hear Chuck's perspective on that.

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah, I think you're right. Anytime you look at sort of different ways to address the same question and you see results that are not similar, that sends up alarm bells. Yeah, again. Of course, you're right that if you're finding statistically significant results with each individual comparison when you combine them and the results are consistent in estimated size, it's just going to get more statistically significant. I'm much more suspicious when it looks like people are rescuing a bunch of non-statistically significant results by combining. That's when red flags start going up in my mind.

Doug Fowle
Senior Analyst, HC Wainwright

What you would mean by that is going by sort of expanding or sort of adding the populations, you sort of start to overpower your analysis for arguably a non-clinically meaningful effect?

Chuck McCulloch
Professor of Biostatistics, UCSF

No, no. I mean, oh, I did this one study, and I got a p-value of 0.06, darn it, did not quite meet the threshold. And I got a similar size effect in this other part of our overarching program. And it had a p-value of 0.08. So because they are consistent, I am going to put them together, and then magically, I get a p-value of 0.03.

Marcio Souza
CEO, Praxis

Which is not at all, right? What happened here is like you have positive studies. Of course, you put them together, just trying to be intellectually honest. When you get a p-value to the 10 of minus 12, of course, if everything is positive and you continue to put them together, they're going to get smaller and smaller. That is just logic in general. We do get very, very consistent, which tells me the true benefit is being assessed on individual studies and on the combined study. That's important. That's the number one reason why you have to submit an integrated summary of efficacy to the FDA on an NDA to assess whether or not they are similar. Here we are, in a sense, in a controlled way, getting that effect.

Doug Fowle
Senior Analyst, HC Wainwright

Chuck, a question from the audience was sort of coming back to the question of dropouts, right, especially due to AEs and when you have an imbalance, which we saw in this study, does it ever reach a point where it makes sort of you lose trust in the sensitivity analysis or tipping point analysis even?

Chuck McCulloch
Professor of Biostatistics, UCSF

I think it's not that I lose faith at a certain missing data rate. The way that I think about it is it puts more emphasis on how appropriate you think the modeling of the missing not at random is. Again, that's why you have to go to pretty high extremes in order to really stress test the system because the results then are very much model-dependent, dependent on the model you're hypothesizing for the missing not at random data. That's why I think it's important to do things like assume that the drug has no benefit or maybe even push it as far as a slight detriment just to make sure that things still hold.

Doug Fowle
Senior Analyst, HC Wainwright

Your point would be that's where things like the jump to reference analysis becomes more important.

Chuck McCulloch
Professor of Biostatistics, UCSF

Right. Pushing the tipping point all the way out, if it's at some ridiculous level that it requires to tip it over, you feel much more confident that even if some of the assumptions in your modeling weren't quite correct, you're still getting very strong and convincing results.

Doug Fowle
Senior Analyst, HC Wainwright

Chuck, one thing that I meant to ask, so I'm going to ask now, and I think that some people sort of looking at this have struggled with is the fact that the company had a futility analysis back in March. They've had difficulty sort of wrapping their heads around that we would have a futility analysis in March, yet when it comes time to finally continuing with reading out the full study, not only do we have a positive study, but we have one that is overwhelmingly statistically significant. Maybe just walk through from your perspective, does that raise red flags for you in any way, or just what are the real possibilities or probabilities that that would actually occur?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. Yes, the things that would cause me to sort of scrutinize a study like this more carefully were, as we've already talked about, change in outcome, relatively high rates of missing data, and sort of more of a curiosity, a futility analysis that suggested that there might be a reason to stop the study. I don't know the details of how the futility analysis was conducted, but typically, you project ahead and you ask, what's the probability of a positive, i.e., a statistically significant result once I've concluded with all the data collection? There are certain assumptions that have to go into that model.

If the data that is collected, like if the interim analysis is done at the halfway point, and then you have another full half of the data to collect, if it is more optimistic than the data that you use to project ahead, of course, that projected probability of futility, not finding a statistically significant result, can be off. The proof is in the pudding in my mind. We do not need to go back and say, what was the probability back then, given that we have convincing results now? I have seen things like this happen in the past, things that only happen 5% or 10% of the time, happen 5% or 10% of the time. It can happen.

Doug Fowle
Senior Analyst, HC Wainwright

Is there any way that the sort of utility analysis could have informed the company's decision to change the endpoint? Does that sort of raise questions for you?

Chuck McCulloch
Professor of Biostatistics, UCSF

Right. Again, I turn back to the issue that it's a multiple testing issue. What were the options at the time? You did a futility analysis. If I put myself in Marcio's place, it's like, what could I do that's legit that I can help to make this study be successful in the end? Changing little things like tiny tweaks on the analysis strategy is probably not going to make much of a difference. For example, if I thought that the drug was going to have a much more immediate impact early on, that might suggest I should move the time point earlier. Okay. Now if I'm going to critique this, I'm in the multiple testing arena. What's the possible benefit that might be gained by the company?

Again, even if I go with thinking of these things as two coprimary endpoints, that's the capitalization on the multiple testing issue that I might have advantaged by knowing about this futility analysis. The penalties that I would apply wouldn't account for that sort of a discrepancy.

Doug Fowle
Senior Analyst, HC Wainwright

Marcio, in hindsight, did you what were sort of the futility sort of threshold that you set? Do you think that have you, in hindsight, sort of recognized what perhaps was flawed about that assumption?

Marcio Souza
CEO, Praxis

Like we make it's very easy, right? Sometimes we don't like the outcome, then we say the decision was wrong. Exactly, decisions are a priori, kind of always right. Then we judge the outcome. Here's like, can you go back and say, if we hadn't changed anything, it would be positive. Oh, therefore, we made the wrong call. No, I think the call we made was scientifically sound, just like Chuck just mentioned, asking about what do we know about the drug. And we know it acts pretty fast. We have very high concentrations here, as we expected, right? And a minor change, I completely agree with that, would be a time point assessment. Now, when you go back and recalculate, knowing the results, what is the probability of being futile? It's actually not that small.

Most people think when you're like, oh, that is a futility recommendation, you are on the 0.001% probability of being successful. That's not the case, right? It is actually you're making an assumption on that point in time. It happened. I think it's I'm glad we didn't stop. I'm glad we were very enrolled on the study, pretty much fully enrolled at the study at that point in time, which allowed us to finish. I don't think sitting here and saying what if is actually very helpful. It's like they said it's positive.

Doug Fowle
Senior Analyst, HC Wainwright

I think we're out of time. I mean, one final question, Marcio, I had maybe. Did you perform sensitivities, including things like jump to reference, which is arguably the most conservative for things like the original primary endpoint?

Marcio Souza
CEO, Praxis

Yeah. Yeah, as you can imagine, it's sensitive to stress as well, right? We're looking to other things. I think that Chuck mentions, we remove all the covariates and we run. We added each one of them and we run. Things that you could say, is there anything that could break this, which for us was important? Are we being misled by the results? The answer is no. This is just a very strong result. To be honest, I'll end with this, Chuck. I know we all care about the markets, definitely our clients do, and I appreciate our investors, but there is nothing for these patients out there. Ultimately, that's why we are developing this. This is a very strong drug that is going to give a lot of relief to a lot of people.

We are just happy to be in this position to have this discussion with the FDA.

Doug Fowle
Senior Analyst, HC Wainwright

Okay. With that, Chuck, if you can give me sort of one or two minutes, just an overview of everything that you learned today, because you sort of gave an assessment, you felt the data set was strong. We've learned some more things from Marcio in the course of the call, which I don't think has necessarily been that dramatic, but they sort of add to the body of knowledge that we have. When you're leaving this call, how do you come out feeling about the robustness of this data set?

Chuck McCulloch
Professor of Biostatistics, UCSF

Yeah. To both summarize and update slightly, strong analysis strategy. I would not have suggested anything different. I do not fault them at all on the analysis strategy. There were some technical details we have not talked about on this call that were chosen that seemed to be pretty conservative and would lead to robust analyses. I have absolutely no problem with the analysis strategy. The potential minor red flags, of course, are change in the outcome, the relatively high rate of missing data, tipping point analysis that was not continued all the way to the end, and the sort of curiosity about the futility analysis. I feel pretty reassured about all those. They did a large number of sensitivity analyses, some preplanned, some not preplanned.

I think that's important because, again, you don't want to be depending on one single model for data that's missing, not at random, again, as brought up by one of the participants in this call, especially when the rates are high. It is important to try different mechanisms. I'm also convinced by the fact that even when you stress test the system, the resulting p-values are relatively small.

Doug Fowle
Senior Analyst, HC Wainwright

Okay. Chuck, that was really helpful. Marcio, thank you very much for taking the time and accommodating us and being willing to sort of come under the gun both from myself as well as Chuck. With that, we'll let everybody go back to their day.

Marcio Souza
CEO, Praxis

Sounds good. Thanks. Very nice meeting you, Chuck, and thanks, Adel, for everything.

Powered by