Good afternoon, everyone. I'm Andrew Han, an editor at GenomeWeb, and I'll be your moderator today. Today's webinar is titled, "Scaling PacBio Sequencing with Automated Library Prep for Core Labs," and is sponsored by Volta Labs. Our speakers today are Maya Fridrikh, PacBio Sequencing Manager at the Center for Advanced Genomics Technology of the Icahn School of Medicine at Mount Sinai, and Greg Young, Senior Product Manager at Pacific Biosciences. You may type in a question at any time during the webinar. You can do this through the Q&A panel, which appears on the right side of the webinar presentation. If you look to the bottom tray of your window, there are a series of widgets to enhance your webinar experience. Please take a moment to fill out our survey questions at any time by clicking on the green survey icon at the bottom of your screen.
With that, I'll turn it over to Greg Young. Hi, Greg, you're muted. If you could, please unmute yourself. I'm sorry, we can't hear you. Please hold on while we address this technical difficulty.
Okay, Andrew, can you hear me now?
Yes. Great. Thank you.
I don't know what happened. I think the browser just needed to be refreshed. I could hear everybody. Okay, sorry about that. So as I was saying, I'm a product manager here at PacBio. I've been with PacBio for about seven years. I manage a whole genome sequencing application, and I really kinda focus on the workflows, and that's why, you know, I'm excited to be here today. I appreciate the invitation from Volta, and I'm really excited that PacBio, you know, being one of the first applications on the new Callisto system. So I kinda wanna just kick things off by giving, you know, some of the folks in the audience that may not be familiar with PacBio, just sort of a, kind of a quick introduction to PacBio.
So our mission is to enable the promise of genomics, you know, to better human health by creating, you know, the world's most advanced sequencing technologies. One of those technologies is HiFi sequencing, and this is a long read sequencing technology. As such, it's gonna provide you with a more complete view of biology because it's gonna give you complete and accurate genomes. It's gonna give you the ability to sequence full-length RNA isoforms, and it's gonna give you the ability to directly detect base modifications that are gonna lead to epigenetic insights. Right? So there's, like, four key attributes to HiFi sequencing that I wanna focus on. The first attribute is its mapability.
Because Hi-Fi reads are long reads, they can be mapped to the more difficult and challenging, you know, parts of the genome, and they can span highly homologous segments, where short reads just can't be mapped confidently, like shown in this example for STRC. The second key attribute is its accuracy. Hi-Fi reads have a medium read quality of Q30 or higher, putting it on par with many short read technologies. And this unique combination of mapability and accuracy means that Hi-Fi reads are gonna give you better variant detection performance and power, because and when looking genome-wide. All right. The third key attribute is the ability to phase, right? Hi-Fi reads are gonna span variants, they're gonna span those regions of heterozygosity, and this is gonna be able...
give you the power to phase the genome or large segments of the genome into the maternal and paternal haplotypes. So you can really kind of view the genome as it exists as two distinct haplotypes. And then the fourth attribute I wanted to focus on was methylation. So the ability to simultaneously and directly detect 5-methylcytosine without the need for any additional sample prep or sequencing experiments. Now, preparing your sample for Hi-Fi sequencing is a simple five-step process that can be completed in a single day, right? So the first step is to take your high molecular weight DNA and to shear it down into smaller fragments, right?
But since we want to retain, you know, larger size fragments than what you would normally do for short read sequencing, we generally apply a mechanical DNA shearing approach rather than a sort of enzymatic approach that's used for a lot of short read workflows. So once you have your DNA fragmented, we're gonna bring that through a DNA repair and A-tailing step to prepare those fragments for ligation with our SMRTbell adapter. Once the ligation reaction is complete, we're gonna clean things up with a nuclease treatment, and this is gonna digest away any unligated fragments or fragments that don't have a SMRTbell on both ends, right? And then we're gonna finish with a cleanup step that's also gonna help deplete the library of very short fragments.
Again, for those of you that might be new to PacBio, I thought I'd just kinda briefly touch on what exactly is a Hi-Fi read. So that library prep process that I just described is gonna create a circular molecule, right? Because we're gonna have hairpin SMRTbell adapters on both ends of those fragments. This is gonna enable our sequencing polymerase to make multiple passes on the first, on the forward and reverse strands. Each of those passes are going to be aligned as a sub-read, and that alignment is going to be used to generate a highly accurate consensus sequence. And it's that consensus sequence that we call the Hi-Fi read. On average, this is going to have an accuracy of 99.9%.
Now, when sequencing on the Revio system, the insert size or that fragment size of your library is really going to determine your sequencing yield, the potential output you can achieve. So for the whole genome sequencing application, we really recommend trying to get your library around that, the average size of your library, somewhere between 15 to 20 kilobases, right? And this is going to allow you to be able to achieve 90 gigabases of total Hi-Fi data on the Revio system, and where you can get 90% of those bases at a predicted quality score of Q30 or higher. And again, the 5mC information is just going to come along for the ride, for free. Okay, so long-read preps, you know, really don't need to be difficult and don't need to require lots of labor. Right.
Shown here in an example of a high-quality sample prepared at PacBio using a fully automated workflow. This library produced over 120 gigabases of Hi-Fi data, with 92% of those bases having a quality score of Q30 or higher, and a read length N50 of 17,000 base pairs. Now, this wasn't, you know, generated on the Callisto system. I don't want to confuse anybody here out there, but what I really want to emphasize is that it's possible to get these kind of results without, you know, you know, get these kind of results using an automated workflow. You don't really need to compromise on performance, and long-read sequencing can be automated, just like short-read preps.
Now, at PacBio, we're continuously working with our ecosystem partners on solutions for every step of the workflow, from sample to answer. You know, Volta has been a great partner, and I'm really excited to see what, our customers can do with this unique sample prep solution. And with that, I'm going to hand it off to Maya to discuss her experience with the Callisto system and the results they've been able to achieve. All right, take it away.
Hi, everyone, I'm Maya. Thank you, Andrew and Greg, for the intros. I'm the Long Read Manager at the Center for Advanced Genomics Technology at the Icahn School of Medicine at Mount Sinai. First, I'll just give a quick description of the CAGT and our current operations, including some of our sequencing technologies and challenges that we face with scaling. Then I'm going to go into two experiments that we recently ran using Volta's Callisto platform and why Callisto ends up being a good solution for our lab to addressing some of our scaling bottlenecks. The CAGT functions as the genomics core, as the Icahn School of Medicine, as well as its own sort of research facility. Aside from some of the high-throughput, fee-for-service sequencing that we do, we also collaborate with companies like Volta on early access programs and early development instrument testing.
PIs will put us on grants, and we also put some of our resources into developing bespoke sequencing solutions for our users, so across the left panel, I just pulled some numbers from 2023. Last year, we collaborated with over 100 internal PIs, more than 30 external organizations, including both academic institutions and companies, and we processed over 14,000 samples over about 600 projects. This number includes a huge variety of sample types, so we're not just processing human, we're processing a lot of plants and microbes, et cetera, and this is across the full spectrum of preps that our facility offers across all of our pipelines, and these numbers are constantly increasing, so year-over-year, we're scaling, not only the number of samples we can process, but also the staff that our center employs.
Currently, we're loosely divided into about six subteams, so we have a microarray team, short- and long-read sequencing teams, and we have single-cell and spatial technologies teams, as well as a bioinformatics group, and all these teams are held to the same standards of excellence, so we need to make sure that all of our users are getting comparable data, so we have to stay consistent between both samples and batches. We need to make sure that data is delivered in a timely fashion, which also requires seamless library prep, and we need to stay at the cutting edge of the genomics field to make sure that we can really provide our users with the latest available assays and the best options for everyone's specific projects, so as with all large-scale operations, there are certainly bottlenecks that limit our growth.
Especially since COVID, we've seen an average technician turnover of about two years, and given that it takes about six months to train somebody to be completely independent in the lab, and our teams are about four to five people on average, that means that one team can be down two staff at a time, training two people at once, and this just takes a lot of time. We also offer a very large variety of projects, so speaking on behalf of the Long Reads team alone, we have at least 12 library preps that require tailoring dependent on the samples, and this doesn't include any of the extraction techniques that we use or any of the specialized sample cleanup techniques, et cetera.
And some specialized protocols within our t eams can be requested as infrequently as a year apart, so it can be difficult for the teams to retain a nuanced understanding of these more specialized protocols. And it's difficult to justify training more than one staff on these protocols, but at the same time, with the high turnover, we have to make sure we can retain the knowledge within the team. There are only so many samples that a human can really process at once, while maintaining reproducibility and high quality. It takes time for humans to be precise, so we need to take this into account when we're assigning our projects.
And we also process a lot of high-volume projects, especially across the short-read team, that can occupy staff members for weeks to months, and this takes away from the time that they could be learning some more specialized preps or working to integrate new workflows into the CAGT repertoire. So over the years, we've updated our offerings. So this is a photo of our current sequencing room and just the sequencer fleet, in terms of what we are currently offering, not involving instruments that we've used in the past and retired, or things that we're looking to include soon. So for short-read sequencing, we leverage a lot of Illumina platforms. We have two NovaSeqs, two NextSeqs, two MiSeq, and two MiniSeqs. We have three Ion GeneStudio S5s. We have one NanoString nCounter and one AVITI from Element Biosciences.
For long-read, currently, we just leverage PacBio, so we have one Revio, one Sequel IIe, and two Sequel IIs, and here I have some of our single-cell and tissue characterization, so spatial technologies. We have a lot of 10x platforms, so we have five Chromium and Chromium Xes. We have two CytAssist, Visium Spatial, Visium HD. We have a Xenium In Situ spatial instrument. We also have two Tapestries from Mission Bio, and we offer a lot of nuclei and cellular enrichment methods, and PacBio is also able to use 10x with the Kinnex single-cell technology that we use frequently, and we're going to be upgrading our AVITI to an AVITI24 soon.
With such a diverse and constantly growing fleet of instruments, we would hope that we could automate at least some of our cornerstone assays so that we could allow our staff to expand their skill sets and for our center to overall expand its array of offerings. And we've definitely tried to implement automation in the past, but we've had limited success. So right now, we use a Hamilton STARlet to do some of our dilutions of completed short-read libraries for pooling. So that's essentially just a liquid handler. It doesn't actually do any of the preps. And we also have succeeded and are very happy with the Ion Chefs that we use for templating on the Ion Torrent instruments.
But both these instruments end up with just one use case, so this is one of the issues we tend to see with a lot of automation platforms. They're not as universal as they tend to be advertised, at least for our specific needs. So either we can use them only within one pipeline or for one specific protocol, something like that. We haven't really found instruments offering compatible library prep or extraction protocols for our center. So just as an example, with extractions, we process a lot of different types of tissue, so we need to use reducing agents in the extraction protocol, and that would require the extraction instrument to either have air filtration technologies in it or to fit within a hood, and we just haven't found something that can accomplish that.
The scale offered, at least for the long-reads team, is not really compatible with the project sizes, so a lot of these instruments will offer batches of 48- to 96-plex, whereas we tend to get a lot of users requesting genome assembly with just a small handful of species, or we get diverse sample quality and sample types. So our typical submission batch is somewhere between one and 16 samples, and given that we need to handle everything, especially, we can't really process the submissions together as easily. So these 48- to 96-plex instruments just don't fit our needs. Of the instruments that we have tried across the CAGT, we've also seen certain ones with high failure rates, and not only high failure, but unpredictable types of failure.
If an instrument failed a certain way last week, that doesn't mean you can predict how to fix the failure that's going to happen next week. At the end, these technologies end up being quite complex in their own rights, and while claiming to reduce the amount of work required for the preps, they do require the operator to have a whole different skill set just to understand this instrument, which makes it debatable as to whether or not the amount of work is actually reduced. I've had the opportunity to work with Volta on two different instrument models. Now, we saw a one-reaction Alpha unit in 2021, and we got to operate a four-reaction prototype of Callisto in 2023.
Through my personal hands-on experience, as well as the experience of my team, we were very pleasantly surprised to see that Callisto addresses almost all the issues that at least the long-reads team has with automation platforms. So for one, the Callisto is truly scalable. The reaction nodes each support up to four samples each, but you can run fewer as well, and you can put up to six nodes on one run at a time. So that's twenty-four total samples at a time, which is very compatible with the batch size that we tend to get. The platform is technology-agnostic, so right now we're using PacBio, but in the future, we'll be able to use Illumina on it as well. It's application-agnostic, so right now we're doing library prep, but we've also done extraction on Callisto.
And these applications will be launching in the form of remote software updates, roughly quarterly from Volta Labs. All the assays right now take less than one workday, and that's including loading and offloading the sample, and some assays even will be taking less than a workday, so you can run two assays on the instrument in one day. The instrument also has a very low failure rate from what I've seen. I only saw one instrument failure, and that was back in twenty twenty-one on an issue that we knew was going to be possible. That problem was corrected immediately and has not been a concern since on the past two Callisto prototypes that I've seen. And importantly, Callisto doesn't require a lot of instrument-specific knowledge. It really is kind of a push-button, walk-away platform.
You load your consumables, your reagents, and samples according to a very clear instrument manual, and then you walk away. There are no user set parameters on the instrument, but the Volta team has been very receptive to our input and feature requests, so after seeing two wonderful prototypes, our team was very impressed with their performance, and we were very excited to get our hands on the commercial unit as soon as it was released this August. We've already been able to run the instrument four times with and without support from the Volta team, and I'll just share a little bit about the instrument installation experience and these four runs, so I've been working with Volta for a while now, so I wasn't too surprised to see how quick and seamless the installation was. The team arrived on a Thursday morning.
I think it took them less than 20 minutes to get the instrument from the FedEx truck, uncrated, up the elevator, and plugged in. They spent a day doing installation, and they brought all their own consumables and everything, and the next day, they came back to do the QC and instrument qualification. Again, it was less than a full workday, and they were out of our hair pretty quickly. Then, when we actually had the training run on Monday, Muhammad from our team worked with Volta to set up the full 24 sample validation run, and this wasn't just our first time operating the new official Callisto unit, but this is also Volta's first time conducting a training.
Despite all that, it took them less than an hour to load the full run, and the samples started running at 10 A.M., and the libraries were collected at 4 P.M. Volta was packed up and on their way out by 4:30 P.M. All in all, less than one business day. Immediately after we were able to validate the instrument, we went into a two-phase study of Callisto's performance. Phase one was just a reference study using HG002, and phase two was a real-world application study with some real samples that we had used in the lab before. For phase one, we were mostly looking at two parameters, the size selection protocol for libraries and the library prep consistency between Callisto and manual preps.
So we used HG002 because that's what PacBio has published data off of from Revio, but we did purchase a different DNA source than what Greg was presenting on previously. So the DNA that Greg was showing was extracted by Nanobind, which is a PacBio recommended high molecular weight DNA kit. We purchased non-high molecular weight gDNA, and it's still fairly high molecular weight, but it's just a bit more representative of our typical user-submitted samples. That said, once we started the experiment, we learned kind of per chance that this particular extract may contain some sort of carryover contaminants that have historically interfered with PacBio enzymes in sequencing. So the samples did sequence fine for us, but we did also see systemically lower yields off Callisto than expected. This is just a caveat to kind of keep in mind.
So we saw the yields were still plenty to sequence, but we saw slightly lower yields at Sinai than we expected, and Volta was able to replicate this in-house as well. So for the size selection methods that we were testing, we mostly were looking at SRE and BluePippin with the 5 K AMpure cleanup as the benchmark. So SRE, or the Short Read Eliminator kit, is, it's a PacBio protocol. It's their current recommendation. It's a manual treatment used on the input material before shearing, and it's supposed to progressively deplete fragments shorter than 10 KB. BluePippin is a gel-based method from Sage Science. It is our lab's tried and true. We typically just run it on the final library rather than on the input material, and it creates a very stark cutoff wherever we manually set on the instrument.
This 5K AMpure bead selection is kind of the baseline that's included in the PacBio 3.0 Hi-Fi prep. It's included at the very end of library prep, and it's also included for this reason on Callisto. PacBio does recommend additional size selection like SRE, but sometimes the mass submitted by a user can be limiting, and all the size selection protocols eliminate between something like 20%-60% of material. So sometimes you kind of have to resort to just using the 5K as the size selection. It is our lab's last resort. We find it doesn't actually always hit 5K. Sometimes it hits something like 3K, if not even shorter, and these short fragments have a huge impact on sequencing, which I'll show in upcoming slides.
But again, sometimes mass is limiting, so we have to use it, and it is a good benchmark. For consistency metrics between Callisto and manual, we were using library yield, library length as measured by the Agilent Femto Pulse, and Hi-Fi read length. For phase two, we tested a variety of sample types, again, most of which have been prepped and sequenced by our lab in the past. So some of these were human-derived, some of these were plant-derived, and we also had some cilia-derived and other samples as well. Across the bottom of the slide is just a resource comparison between running libraries on Callisto and manual, and this is specifically for the PacBio SMRTbell prep.
For Callisto, the hands-on time at the beginning is about 30 minutes to load, and then at the end, you need maybe 10, 15 minutes just to collect samples and wipe off the instrument. For the five hours in between, you can walk away, whereas when you're doing libraries manually, there is some walkaway time, but you do have to be present to add the reagents. In terms of reagents, Callisto actually uses half the volume of the PacBio reagents compared to what's listed in the standard 3.0 protocol, so you can process twice as many samples per PacBio reagent kit when using Callisto. In terms of tips, these numbers are for a batch of 24 samples. Callisto will use about 190 tips, and manually, you'll use about three times as many.
This is because Callisto doesn't actually pipette the sample at any point other than loading and offloading. You can see in the three panels on the left is a bead-based cleanup, so sample and beads are mixed acoustically, and then the beads are moved around the node magnetically, and buffers or washes are removed using a sort of sponge at the side of the node. Then in the right three panels, you see two reagents being mixed on the platform without having to pipette either of them up and down. For other consumables, just to compare, on the Callisto, you just need to use the Volta node, and they have special wipes for the electrodes after the nodes are removed, whereas normally you'd be using tubes or plates, et cetera.
Just a brief overview of phase one and where the size selection protocols were applied. We used three replicates per condition for almost all of the conditions, and I'll be showing data for the five microgram input conditions. We did test three and five micrograms on most of the conditions for the Callisto preps, and we chose these masses because typically, PacBio recommends using two to five micrograms of input material per Revio SMRT Cell, depending on the quality of the gDNA. The samples at our lab processes are often from tricky sources, like plants and insects, so we typically request more DNA. We request closer to five micrograms from our users to make sure that we can have enough input, either to run multiple SMRT cells to up the coverage or to repeat the prep if something fails.
And since size selection protocols tend to deplete the amount of material, we like to request excess, but three micrograms is a nice baseline to have for your typical sort of PacBio Hi-Fi user. So the conditions we use to address size selection, so again, we used five K only. We used SRE five K, and we used five K BluePippin on Callisto. Typically, we, when we manually do BluePippin, we do not also do a five K size selection protocol at the end of library prep, but since the five K AMpure bead selection is default on Callisto, it's just something to keep in mind going forward. And we did request an option from Volta to run the library prep without the five K protocol, and they listened, and soon we will have our application for it.
For all the samples, we first QC'd all the gDNA on Femto. It came back around 50 KB, so again, plenty high molecular weight, enough to be running PacBio preps with it. We performed SRE on a large quantity of the gDNA that was designated for the SRE experiments, and we got about 80% yield from this. We sheared the SRE and non-SRE gDNA by g-TUBE. We pulled it, and then we redistributed it for the preps. So for each of the five microgram samples that did not go through SRE, we started with five micrograms, both on manual and Callisto. For the SRE samples, in order to replicate, sort of receiving five micrograms from the user and then losing 20% during SRE, we ended up starting those preps all with four micrograms.
We prepped the libraries according to PacBio's whole genome library prep 3.0, either manually or on Callisto, and then we either did or did not do BluePippin, depending on the experimental conditions. The sequencing that we chose to do after this was, we did some low-pass sequencing across three SMRT cells. We chose to do it across three SMRT cells because in our experience, we know that both the 5K protocol and the SRE protocol will have short reads, regardless of how the library trace looks on the Femto, and Revio preferentially will sequence these short fragments when they're present. So we wanted to run the BluePippin samples separately from the 5K and SRE just to make sure that we weren't interfering with the molarities of the samples.
And we also ran 30x coverage or about one Revio SMRT Cell per four of the standard samples to compare variant calling data between Callisto and manual preps. So for the SRE 5K workflow, so this is again, PacBio's current recommendation is to do SRE on the input material and then to do the 5K AMpure cleanup at the end of the prep. We did see lower yield on Callisto, but again, we did see systemically lower yield for this particular HG002 extract than we expected in general. The typical conversion rate on Callisto is something like 30%-40%, but here we saw around 20%. But regardless, for libraries of this size, so the about 16-18 KB that the Femto is displaying, we would want to use about 350 nanograms per SMRT cell.
So even still, this yield of about seven hundred and fifty nanograms is plenty for two Revio SMRT Cells or so. And just for context, from the three microgram prep, we were getting about four hundred and thirty nanograms, which is still enough for one SMRT Cell. And the Femto library on Callisto is comparable to the manual, if not even a bit longer. So we divided the BluePippin slides just to demonstrate the effect of BluePippin on the libraries. So here, the pre-BluePippin material on Callisto is. It did not undergo SRE, and then it went through the standard PacBio 3.0 prep, and then a 5K AMPure selection. So this is really essentially just the 5K protocol. The overall mass on Callisto here is slightly higher than in the SRE 5K protocol, which is to be expected.
But the manual mass here is still a bit higher than Callisto, and that, again, could have to do with the sample type, but also because Callisto has done a five K that manually we did not. Again, the lengths of the libraries are almost identical between Callisto and manual. After running BluePippin, you can see that the yields are now comparable from Callisto and manual while maintaining similar sizes. It seems like the five K eliminated a portion of the short molecules that BluePippin would have eliminated on its own. In this case, the five K on Callisto does become redundant, but again, this is not the exact intended use case for this specific application. The library lengths are comparable and also very similar to what the Femto displayed on the pre-BluePippin lengths.
From the sequencing data in the next slide, it'll become evident that these samples are actually not equivalent to the pre-BluePippin samples. So, like I said, it's evident that at least some size selection needs to happen in conjunction with the five KB AMpure selection. So here we just have the read length distributions from the three different SMRT cells that we ran. We can see that the five K SMRT cell in the left panel is overtaken by reads shorter than about seven KB. SRE in the middle helps to lower the shoulder a good bit, but you can still see that there are plenty of shorter reads, whereas the BluePippin SMRT cell in the right panel has almost entirely eliminated reads below the cutoff that we set.
So we requested BluePippin between 8 to 28 KB, and there's really nothing shorter than 8 KB in this panel... So another thing to note, this SRE data, the middle panel, looks quite different than the data that Greg was showing at the beginning of this presentation for a few reasons. So firstly, again, we did not use the high molecular weight gDNA sample provided by Coriell, so our input material was of lower quality in the beginning than PacBio's. Secondly, again, the PacBio data was completely from an automated prep 100% with very fine-tuned protocols, whereas here we only automated the library prep. And thirdly, the PacBio data comes from Megaruptor shearing. So PacBio currently recommends pipette tip-based shearing, whereas our lab is still using g-TUBEs. G-tube shearing results in a much broader fragment size distribution than pipette tip shearing.
So we're not too surprised to see that we have many more short reads than the data that Greg was showing. So as for the Hi-Fi yields from the size selection experiment, all three SMRT Cells produced very similar amounts of data. But as the size selection protocols get more stringent from the top to the bottom of this table, we see fewer Hi-Fi reads. So BluePippin has about half as many reads as the five K only SMRT Cell does. And what's happening here is that the short molecules that were not eliminated by BluePippin are loading more readily onto the sequencer than the long ones, so they produce a high quantity of Hi-Fi reads without actually contributing to the usable yield.
While the overall amount of data on all these SMRT cells is similar, you can see in the right column that the yield of longer than 10 KB reads is about doubled on the BluePippin SMRT cell than in the five KB SMRT cell. Typically, PacBio is used for things like spanning long repeat regions or for genome assembly, so we really want to see reads greater than about eight KB. A lot of the data on the five K only SMRT cell is essentially unusable for these purposes. The mean read lengths for each size selection method are pretty concordant between Callisto and manual, and the replicates of Callisto fall within a very tight range.
As expected, the BluePippin Hi-Fi read lengths are longer than for SRE than for five K as well, with roughly a three to four KB delta between each. These read lengths look nothing like the library lengths that we saw off of Femto because the short reads on the five K and SRE SMRT cells are really skewing the averages. We did also run a test condition for SRE and BluePippin combined, just to see what would happen, and again, it followed the trends that we were expecting, slightly longer Hi-Fi read lengths here. The overall yields for the libraries decrease progressively as the size selection protocols become more stringent. This is also to be expected. In this capacity, we do see Callisto trending with the manual library preps.
We're still not sure why the SRE yield is lower off Callisto than off of Manual, but we have data from phase two that indicates that SRE Callisto, as a combined approach, is actually not a problem. We see very, kind of expected results in phase two of the study. So it's possible that some of the SRE reagents are reacting with those carryover contaminants from the extraction of this particular HG002 sample. The Hi-Fi Q scores on all these SMRT cells are excellent, so they're all above 35 or so. And the reason that they kind of trend downward as the size selection method becomes more stringent is because the more short fragments that are present on the SMRT cell, the more opportunity the polymerase on those molecules has to pass over the insert many, many times and generate really high-quality reads.
So a lot of these reads that are bringing up the Q score on the 5K SMRT Cell are actually really short, and again, not contributing to the usable Hi-Fi data. So for phase two of our experiments on Callisto, we wanted to assess how well Callisto performs on diverse sample types. So from most of the samples that we tested for phase two, we had actually prepped them previously using BluePippin manually. So we don't have a ton of apples to apples comparisons for phase two because we wanted to test how SRE and BluePippin both perform with Callisto libraries. But again, since we don't use SRE typically in our workflow, we can't compare them 100%. But either way, these yields that we're showing are actually from post-SRE.
So previously, we were showing from pre-SRE, and now we're showing from post-SRE, not including the original sample mass as the starting point for this data. That's because SRE acts very differently on these different sample types, so we just didn't want to add another variable to this data. Regardless, we see that the post-SRE input conversion rates of Callisto are somewhere between 25% and 40%. And again, because we haven't run SRE on these samples, we can't directly compare it to the manual that we did BluePippin on. But all in all, if you do take the original input mass and then compare it to the final yield of the final size selected library, then even the SRE Callisto samples look very similar to the manual BluePippin libraries.
And so here we chose a couple samples to actually compare the yields from Callisto and manual. So again, these data are not inclusive of the size selection protocol, but the Callisto yields are mostly concordant with the manual. And the reason you see the discrepancy between high-quality plant one and two, whereas the manual looks similar, is because we ran SRE on the high quality plant one before loading it onto Callisto, whereas we didn't use it on high quality plant two. So just to briefly discuss these samples, Drosophila was a low-quality DNA sample that we received many years ago. So even in our own team, we only performed the five K size selection, and our start-to-finish yield was about 4%. With Callisto, we did BluePippin, and the start-to-finish yield was about 5%, so the same as ours.
The HeLa DNA was really great, and we got great yields manually and off Callisto, and again, both of the plants were high quality, but after comparing the input mass of both of these plants to the post size selection, so for SRE on Callisto, on high-quality plant one, manual BluePippin, plant one, and then BluePippin on both Callisto and manual for high-quality plant two, all of those yields were comparable, so just as a bit of a sanity check, we also ran Google DeepVariant on HG002. We ran 430x SMRT cells on it. We ran a Callisto BluePippin, a Callisto SRE, manual BluePippin, and manual SRE SMRT cells. They look almost identical. We saw no difference in SNP or indel calling precision, sensitivity, or F1 between Callisto-prepped and manual prep libraries.
So we can comfortably say that the Callisto libraries are equivalent to our lab's current gold standard. So just to sum up what we were able to learn from these two experiments, Callisto was able to perform as well as our team in generating these high-quality Hi-Fi libraries for SMRT sequencing on Revio. The yields, library lengths, and Hi-Fi read lengths were equivalent on the reference samples, and it was nice to see similar yields from start to finish, beginning of prep to end of prep in phase two. Again, we couldn't really compare a lot of those samples, apples to apples, because of the size selection protocols that our lab used. But we did see the same trends that we saw in the phase one samples.
Finally, BluePippin does remain a superior size selection protocol to the SRE kit, but SRE definitely does rescue a lot of the data compared to 5K only. Callisto was also able to replicate high-quality sequencing metrics in terms of both the Q score and Hi-Fi read length. Finally, Callisto was able to produce Hi-Fi libraries with equivalently high variant calling metrics as we can generate manually. We're very excited to be able to incorporate Callisto into our SOPs at the CAGT in the long read team, and soon to share the instrument across different pipelines. Phase one of these experiments was conducted in one single instrument run, so we ran a full 24-plex, and then phase two was done across two smaller runs.
We now have N equals three runs on the commercial unit to show great reproducibility and consistency on the instrument, and we can trust that our more unique samples can be processed on Callisto after testing phase two. The runs on the instrument are very quick, and Volta promises protocols are gonna be shorter than a workday, and this includes loading and offloading the instruments. The training was very quick and foolproof as well. For any upcoming issues or protocols we'd like to see implemented on the instrument, we trust that the Volta team will be quick to act and work with us to provide products that meet our needs.
So zooming out, Volta prep for PacBio on Callisto is going to allow users to produce twice as many libraries per PacBio reagent kit as manually prepping the samples would, while also taking just a fraction of the time to train someone on the prep and on the instrument in general. So during phase one, we did a small test on different input masses that we just didn't show data for. We ran one, three, and five micrograms of input material through Callisto 5K and found that, the results were almost identical in terms of the percentage yield. So Callisto can accept a full range of DNA mass inputs and produce proportionate amounts of libraries, the sequence at the same quality from both low and high input.
Another thing to keep in mind is that we only used g-TUBEs during our study, but typically, PacBio recommends using Megaruptor 3 now, and that's what we used in some of the experiments that we did with Volta on the prototype last year. We saw very, very similar results from Callisto and manual libraries again. We can comfortably say that Callisto can generate quality libraries from multiple shearing and size selection methods. Now that Callisto is a commercially available unit, Volta Labs will be focusing on expanding the array of applications offered beyond just the PacBio Hi-Fi prep app. Next in the roadmap is some high molecular weight gDNA extractions from whole blood, which we'll be testing very shortly, and we're very excited to use.
But Volta plans to expand this prep to myriad sample types and perform regular non-high molecular weight gDNA extractions as well. Soon there will also be some short read library preps, as well as hybrid capture protocols and enzymatic fragmentation for Illumina sequencing. Callisto, as one platform, can be used for many applications in the lab, short and long read, et cetera, not just one instrument per application at this point, and the Volta team will be happy to provide more information about the new apps and release dates during the Q&A. Finally, just a quick thank you to the CAGT at large and our director, Bobby Sebra, and specifically the PacBio team. Muhammad operated Callisto for all these experiments, as well as in 2023, along with Aaron.
Ginters generated all the manual libraries for our studies this year and last year, as well as doing all the sequencing, and Irina provided some of the samples here and contributed a lot to the planning and execution of all these studies. Just thank you to Greg and the rest of the PacBio team, as well as the crew at Volta, who have been working very hard with us to bring this project to completion and to build out the presentation, and finally, thank you, Andrew and GenomeWeb, for hosting this webinar.
Thank you, Maya. As a reminder for webinar participants, if you have a question, please type it into the Q&A box in the control panel. We'd like to remind attendees to take a moment after the webinar has ended to fill out our exit survey to give us your feedback. For the Q&A session, you will also be joined by Abdul Muhammad, Head of Application Development at Volta Labs, any product-specific questions... Maya shared some of the new workflows that are in development. Mohammed, what are some of the other workflows Callisto supports, and what can you tell us about the ones she told us about? I'm sorry, Abdul.
Yeah, can you guys hear me well? Just wanted to double-check. Awesome. I see Greg nodding. Well, thanks, Andrew, for the question. So your question is about our application roadmap and new applications that are coming. I wanted to first, like, you know, thank Maya and Greg for the great talk and presentation. Collaborating with Maya's team has been. It's a very classy team, and it's been phenomenal. All right. So regarding, as you saw, like, PacBio is one of our first applications that we sort of commercialized on the Callisto instrument. Take a step back, Volta sort of sees itself as a genomics, you know, applications company.
So we will, you know, as Maya pointed out, we will be rolling out new applications on a quarterly basis. And so in Q4 2024, we have extractions, high molecular DNA extraction from whole blood, as well as short-read extractions from whole blood, you know, launching. The third app we have is the Illumina enzymatic fragmentation-based library prep for Illumina, or sort of for short-read customers. We're quite excited about that. Sort of in Q1 twenty twenty-five, we have our first hybrid capture workflow, that's IDT based, that we're quite excited about. And then beyond that, we have, you know, a single-cell library prep, RNA-seq-based workflows on our roadmap that we're quite excited about.
Do you have any plans to test DNA from saliva? I believe that's a question for Abdul, but...
Cool. So DNA from saliva. I'm assuming this question is about testing DNA from saliva in the PacBio library prep. We, you know, in our testing, I believe we've not tested this. Maya, do you have any thoughts on saliva DNA into the PacBio library prep?
We've definitely done it before. Actually, I don't know if you remember, there was one experiment that we did last year on the prototype that was using saliva-extracted DNA, so some of the older DNA that our lab had, and those we did 36 total samples when we processed the samples previously, and they did come back with pretty decent Hi-Fi read lengths, but a lot of the samples took quite a few SMRT Cells to get high enough quality data.
Yeah, we've done this internally to PacBio. We sequenced a number of saliva samples, and the sequencing performance is fine. It's just, yeah, with saliva, it can be a little bit tricky 'cause the quality of DNA is gonna vary so much depending on person to person and how much cells are actually in their saliva and so forth. But, I would say just stay tuned, and you'll hear more about that from us, at PacBio coming soon.
This is a question for Maya. Have you attempted to make larger libraries in the range of 18-20 kb on the Callisto?
So this was us shooting for the 15-18 kb-ish range, and the BluePippin SMRT Cell, I just checked, sequenced at about 14.5 kb on average. So we definitely will be shooting longer, but the problem that we find with libraries that long on Revio is that's kind of where, at least in our experience, where it kind of begins to drop off a little bit in terms of the Hi-Fi, kind of turnover from the length of the molecule, just because of how long the polymerase will read for it. It requires a certain number of passes and quality of data in order to be considered a Hi-Fi read. So around 20 kb is pushing it a little bit, but PacBio did release a longer movie time this year, so it might be...
It's definitely coming up on, you know, the realm of likely good data for that length. But it's really just going to depend on the sample types that we're sequencing. We did these two experiments within the past month and a half or so, so we have not had time to create more libraries yet.
I'd like to remind our attendees that we have a lot of time for their questions, and if they have something to ask, please type it into the Q&A box and control panel. Carryover contaminant in the HG002 sample was, and if so, were you able to remove it?
So we don't know, at least our lab, we don't know what it was. We actually received a tip that it might not sequence so well from someone at PacBio, and they didn't, I don't think they even knew what it was. Maybe it's somewhere known within PacBio. But either way, it seems we avoided the sequencing issue, so it, either we removed it or, I don't know, maybe something happened. Maybe we received an aliquot that just worked better.
Yeah, we don't know what the exact contaminant is, but yeah, just doing, like, the traditional workflow or the AMPure bead size selection seems to remove it, or if you do gel-based size selection, that will remove it as well. But if you're gonna order DNA from Coriell, we recommend just getting the cell pellets if you can, and then doing your own extraction, so you can control the size of the DNA.
Maybe, if I could chime in, agree with that, Greg. I think one of the first steps in Callisto for the PacBio library prep is also a cleanup. So maybe that helps with the bunch of sort of bead cleanup steps throughout the workflow. So we don't know the specifics.
... What might be the reason for lower mass in Callisto prep versus manual prep?
So again, we think that this had to do with this particular sample type. In our hands, the Callisto was producing a little bit less mass than we were expecting, but also, Volta also ran an internal test on this particular HG002 sample. And again, was just yielding lower than other things that Volta had tested. In terms of the actual... If we consider from the point of library prep beginning to the end of five K, without considering SRE or BluePippin, then Callisto yielding about 25%-40% is about what we get when we do SMRTbell manually. So I don't think I would say that Callisto is yielding lower. There was just that one SRE sample for HG002 .
Is the ABC part of the protocol also automated on the Callisto?
It is not. So Callisto, Abdul, correct me if I'm wrong, Callisto starts with a 5K cleanup on the sheared or unsheared, sheared DNA, right? So it starts with that, and then it goes through nuclease treatment, and then, yeah, 5K, and it's done.
That's correct. Yeah, that's correct. So Callisto automates the library prep, which is like, after you get the sheared DNA, you do sort of a cleanup, then you do end repair, A-tailing, you ligation, you adapt your hairpins, then you do cleanup, then you do your nuclease to chop off all the linear libraries. Then you do a size selection, bead-based size selection, after which you elute your libraries. That's when the workflow ends. ABC currently is not part of this workflow. It's something on our roadmap that we're actively considering.
What are the minimum and maximum inputs for the Callisto?
I can take that one. The minimum and maximum input for... I'm assuming this question is for the PacBio library prep on Callisto. So, you know, the, it's one to five micrograms, whatever the recommendation from PacBio standard chemistry comes, we sort of support the entire range. Of course, this workflow also comes standard with the bead-based size selection. So there is requirements from a sizing perspective. You wanna make sure your molecules are well above, you know, five, ten kb, so you don't deplete your molecules because of the bead-based size selection. So they're the two key technical, sort of, input specifications that you need to think about.
How easy is it to make modifications to the standard workflow, such as removing the final five kb size selection? Is this something Volta does, or is the user able to do it?
Currently, this is something that Volta does. It's, like, you know, our vision is to be sort of the genomics applications company, so we wanna, like, provide to our customers push button experience. So, currently, so as Maya, so Maya's teams requested this update to remove the five kb, and so Volta, sort of the application development team at Volta Labs is on it. And so we'll provide the change to the protocol as a software update.
We have a couple of questions about single cell. Is the Connect library prep on your roadmap to be part of automated your automation?
Active conversations, I would say. So, you know, I'd say actively considering. If this... I would say, like, maybe just general comment, you know, we'd also like, you know, we're seeing a lot of workflows from customers that they've expressed interest in automating. Please reach out to us, maybe, to abdul@voltalabs.co, or even contact us on our website. We'd love to hear, you know, requests and ideas coming from our customers directly.
And then, another question for Abdul about single-cell RNA-seq. Are you planning on a plug-and-play kit, or would you be looking at a different system?
To be determined. This is, you know, we're sort of actively discussing this, internally.
One attendee asks, "Can I barcode the libraries using PacBio barcoded SMRTbell adapters?
Yes. You know, you can choose to add barcoded and non-barcoded adapters. We support both of them.
Maya, is a Callisto prep something that you can charge a customer for or that they can request, or is it something that you run sort of at your own discretion, or do you just run everything now?
Phase two was kind of the determining point for us as to how Callisto handles the sort of less ideal sample types, and since we saw pretty similar results from Callisto, I don't know if we would necessarily, you know, tell the person that we're prepping libraries for, that we're doing it on Callisto versus manually, because sometimes it just kinda comes down to the spur of the moment or if they have, like, an odd multiple of samples, right? If they have maybe five samples or even two samples, and the nodes don't support as well. They do support two to five samples, but they're better for us in multiples of four. I think it would just kind of depend on the project, but for a larger project, I think we'd be comfortable just using the Callisto either way.
I'd like to put out one last call for questions. Please enter into the Q&A toolbar. Big picture, what are some of the key benefits that Callisto could have for labs currently doing manual library prep?
At least for us, it's essentially more than a work day that we're getting back. So we typically prep libraries eight at a time. So this would be, you know, a 24-sample run would be three full days that staff can be doing other things or, you know, day one, you process SRE on some samples, and then you load the instrument, and then you can run BluePippin on some other samples that you were doing previously, which is also a longer assay. There's just a lot you could do in those six hours.
All right, well, it looks like that's all the questions we have for today. We'd like to thank Maya Fridrikh, Greg Young, and Abdul Muhammad, and our sponsor, Volta Labs. If we didn't have time to get to your questions, we will try to follow up with our experts. As a reminder, please look out for the survey after you log out to provide your feedback. If you missed any part of this webinar or would like to listen to it again, an archived version will be emailed to all attendees. Thank you for joining us for this GenomeWeb webinar.