Status Update

Apr 17, 2024

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Hi everyone. I'm Jeremy Wilkinson. I'm PacBio's segment lead for microbial genomics, and I'll be the host and moderator of today's webinar. I want to welcome you all to this webinar with today's topic focused on the recent advances by PacBio in microbial genomics, with the release of new products, the HiFi Prep Kits, for use on PacBio long-read sequencing platforms, as well as an example study comparing same-sample microbial whole genome sequencing from PacBio HiFi sequencing utilizing the new HiFi Plex Prep Kit 96 compared to short-read whole genome sequencing for assembly and antimicrobial resistance results analyzed on the Pathogenomix platform. We have a lot of great material to cover today, and the presentations portion of this webinar is prerecorded, but it will be followed by a live question-and-answer session with all the speakers.

You are welcome to submit your questions at any point during this webinar by typing them in the area provided on your attendee control panel, and we'll get to those either in the chat or at the end during that session. We've also uploaded a few complimentary pieces of literature to your control panel, so please feel free to download those and take them with you. We'll be recording this webinar and making it available for you in the next few days, so please keep an eye out for a follow-up email with a link to that recording. Immediately following this webinar, you'll receive a brief questionnaire. Please do take the time to fill that out as it does help us understand your needs and plan for future webinars. You can also stay up to date on upcoming webinars by visiting pacb.com/events. So we have some important audio information.

This webinar, like I said, has both live and prerecorded content. If you do encounter audio issues as we switch between the live conversation, which we're having right now, and the prerecorded talks that are about to start in this webinar, you may have the wrong audio mode selected, so please check the audio tab to ensure that you have selected the correct one, either computer or phone, depending on the device you're using. If you do continue to experience audio issues, we do apologize for that, but we are recording this webinar and will be making it available in the next few days. If you are attending on a phone, the audio quality is better if you're using a headset or headphones.

We also currently have active promotions for our long-read Revio system, so if you're interested in more information on these promotions, please let us know in the post-webinar survey or by going to the Contact Us form on pacb.com. So for the agenda today, this is a one-hour webinar. We have two talks and a live Q&A at the end. I'm joined today by two great experts. So the first is Gregory Young. He's a senior product manager of whole genome sequencing at PacBio. He's going to be presenting on more samples, lower cost, less time using new PacBio HiFi Prep Kits, plus microbial whole genome sequencing and antimicrobial resistance. Second, we have Jonathan Monk.

He's the co-founder and head of bioinformatics and analytics at Pathogenomix , and he's going to be presenting on comparing short-read to PacBio long-read sequencing results in the Pathogenomix platform with applications in microbial whole genome sequencing, antimicrobial resistance, and phylogenomics. So with that, let's get started.

Greg Young

Senior Product Manager, PacBio

Thanks, Jeremy, for the introduction. I'd like to thank everybody for attending today's webinar. I'm going to start off by talking about HiFi sequencing, talk a bit about the HiFi sequencing workflow, and then I'll jump into talking about our new HiFi Prep Kits. PacBio HiFi reads combine the best of two worlds: high accuracy with long read length. So you get the accuracy of a short read, but at 100x longer length. Now, this is possible because we make multiple sequencing observations on the same stretch of DNA from a single molecule. This washes out any random errors and allows for a highly accurate consensus sequence to be called from the sub-reads. That consensus sequence generation happens on the instrument, so it's no additional expense to the customer to run that on your local server or HPC system.

If you want to explore our HiFi data and see for yourself, we have many datasets available. I encourage you, definitely if you're working with metagenomic or microbial applications, to visit our PB Metagenomics GitHub page. We also have other datasets available for both the Revio and Sequel II systems at pacb.com/datasets. The first step to creating a HiFi read, or for any long-read DNA sequencing experiment for that matter, should be the extraction or isolation of high molecular weight DNA. The first step in preparing a HiFi library is to mechanically shear your DNA. For most whole genome sequencing applications, we recommend an average fragment size between 15-20 kb. When working with more degraded samples, smaller average fragment sizes may be necessary. The key here is to really have a narrow distribution.

You don't want too many short fragments, and you don't want too many really long fragments in order to optimize that coverage and sequencing yield. Once you have your DNA fragmented, the sample will be brought through a DNA damage and end repair step, then ligated with our SMRTbell adapters, and this can include using an indexed adapter. Then that will be treated with a nuclease cocktail to remove any unligated, semi-ligated, or damaged DNA sequences. After the nuclease treatment, the library is cleaned up using beads or using an optional size selection method to further enrich for larger fragments. Now, prior to loading your library on the HiFi system, the library needs to be bound with a sequencing polymerase. Then finally, once you have that polymerase-bound library, it's loaded onto the HiFi sequencing system.

Previously, bottlenecks in the workflow and the prep costs were barriers to scaling long-read sequencing or multiplexing many samples onto the single sequencing chip. Now, these bottlenecks were primarily around available options for DNA shearing and size selection. In terms of cost, the need to purchase single-use individual consumables for DNA shearing and the cost on the library prep reagents made large multiplexing experiments prohibitively expensive. I intentionally put the emphasis on "were" because we now have solutions that eliminate these bottlenecks and dramatically reduce the library prep costs. These solutions are in the form of the new HiFi Prep Kit 96 and the new HiFi Plex Prep Kit 96. Now, the HiFi Prep Kit 96 is designed for high-throughput workflows for doing human, plant or animal whole genome sequencing projects, and it includes a scalable DNA shearing solution.

The workflow itself is going to take 60% less time than our previous recommendations, and it's going to have a 40% lower overall workflow cost. Now, every step of this workflow can be automated for that end-to-end solution. But today, in today's talk, I'm going to focus on the HiFi Plex Prep Kit 96. With the HiFi Plex Prep Kit, we have a new scalable long-read multiplexing workflow that's going to be very cost-effective for our customers. The library prep reagents are priced at $32 per sample but require separate purchase of one of our four SMRTbell adapter index plates at $7 per index for a total library prep cost of $39 per sample. Now, if you're looking to really scale up, it is possible to do 1,500 samples in a single Revio run, and this is by doing four cells at a time, each cell with a 384 Plex.

This kit and workflow represents a more than 50% reduction in our per-sample cost over our previous solution. One can prepare up to 96 samples in as little time as six hours. Another really cool feature is that you can go in with less than 300 nanograms of DNA per sample. Our previous solution, this was the minimum amount of DNA we recommended. Here is the maximum amount of DNA that we recommend going into. Now, this kit and workflow is designed for microbial whole genome sequencing or for metagenomic shotgun sequencing, where you may want to do functional or taxonomic profiling, or for amplicon sequencing where you need an adapter index solution. Now, this workflow can be automated, and we do have methods available on the Hamilton NGS STAR method today.

One of the ways we've enabled these workflows to be scalable is the development of new low-cost, high-throughput DNA shearing methods using liquid handling automation. We have qualified protocols for the Microlab PREP and the NGS STAR systems. The Microlab PREP is really cool because it's a low-cost, small benchtop system. It can shear 24 samples in 22 minutes, while the NGS STAR is a higher-throughput system. It can do 96 samples in 10 minutes using the multiprobe head. PacBio can provide you with the method for the Microlab PREP directly, while we recommend going to Hamilton for proper installation of the shearing method for the NGS STAR system. What's also really great about this new method is that it doesn't require any additional individual consumables.

So the cost is approximately only $0.12 per sample, and this is the cost of the pipette tip and plastic plate holding the samples. The method also produces really nice size distributions, perfect for HiFi sequencing. As I've shown here on the right-hand side of the slide, the size distribution of fragments is nearly identical to those produced by the Megaruptor 3 system, which has been the standard over the last few years. Okay, for now, let's go into a little more detail on the HiFi Plex workflow. So the workflow begins with that pipette DNA shearing method that I just discussed. After shearing, the samples are brought into library prep using the new HiFi Plex Prep Kit 96. For the first four steps, the samples are going to be processed in parallel and automated on the NGS STAR system. This includes sample pooling.

But once you have the samples pooled, the cleanup and NGS steps are performed manually, and this takes a total of approximately six hours. To then prepare the samples for sequencing, the four pools are combined for the primer annealing, polymerase binding, and final cleanup steps, which we abbreviate as the ABC workflow. We recommend 50- 300 nanograms of genomic DNA input per sample. If you're working with amplicons, that should be reduced to 20- 200 nanograms per sample. And since this is native single-molecule DNA sequencing, base modifications can be detected by saving the polymerase kinetic data. Now, this kit and workflow supports a minimum Plex level of 24 samples, and up to a maximum of 384 samples can be sequenced onto a single Revio SMRT cell. With this workflow, up to 96 sequence-ready libraries can be built in just one single workday.

With the combination of this new HiFi Plex Kit and the Revio system, PacBio can now deliver a HiFi microbial genome for under $50. This puts us at cost parity with short-reads, but with huge advantages in data quality and analysis that Jonathan here in a moment will discuss. At the 96-Plex level, sequencing costs $10 per sample and $3 per sample at the 384 Plex level. A 96-Plex will consistently produce over 100x coverage per sample, depending on how you balance the pool. And considering that only 15x-30x is necessary for a good microbial assembly using PacBio HiFi reads, users can easily scale up to 384 samples per Revio SMRT cell. And finally, to finish up, I wanted to present the primary sequencing results for the dataset that Jonathan will be discussing.

This set of 96 ESKAPE pathogens was brought through the HiFi Plex protocol in the Hamilton NGS STAR system, including the DNA shearing step. I also want to note that we did not normalize the DNA input because we knew the coverage would be high enough per sample to not have to worry too much about having the perfectly balanced pool. As you can see, this protocol can produce amazing results for microbial sequencing. We have 12 KB reads on average with a median read quality score of Q40. Over 94% of the bases have a predicted quality of Q30 or higher, and 99.8% of the reads could be confidently assigned a barcode. I just want to take a moment to let that kind of sink in. You can now get 12 KB reads at a Q40 quality for under $50 per sample.

More importantly, though, it's all about what you can do with this kind of data. So I'm going to turn it over to Jonathan so he can discuss what they've been able to glean from this dataset using their analysis pipeline.

Jonathan Monk

Co-Founder and Head of Bioinformatics and Analytics, Pomona Pathogenomics

Yeah, thanks, Greg. I'm happy to present here on comparing short-read to PacBio long-read sequencing results in Pathogenomix' s Platform P3, with particular applications in microbial whole genome sequencing, antimicrobial resistance, and phylogenomics. But first, I want to give an overview of Pathogenomix. Our mission here is to empower medical and public health communities to understand, outpace, and outsmart pathogens, including those we've never seen before. And our focus is on building innovative products and services for bioinformatics, analysis, and surveillance, including our cloud-based platform P3 for bioinformatics and advanced analytics with dynamic visualizations and sharing, which I'll demo later today. Here's a brief outline of my talk.

I'll start with a sample and sequencing overview. I'll next move on to a comparison of the coverage between these samples, an evaluation of the assembly quality, the impact of different assembly results on phylogenomic trees and epidemiology. I'll go into a comparison of AMR gene detection and the impact of that gene detection on AMR prediction results. I'll have a demo of Pathogenomix's P3 platform. Here's a brief overview of our samples. We had 96 extended-spectrum beta-lactamase, ESBL, single colony isolates from the ESKAPE family, which include E. coli, Staph aureus, Klebsiella pneumoniae, A. baumannii, Enterobacter and Enterococcus. These seven species have been identified by the CDC and WHO as emerging antibiotic resistance threats. We collected these strains over 5 years at a regional public health lab. The samples were sequenced using both short-read and PacBio long-read technologies from the same DNA extraction.

They were de novo assembled, annotated, and evaluated using the P3 platform. The input DNA for the PacBio long-read sequencing looked like this. We first sequenced these with short-read technologies, and then we passed on that same DNA for the PacBio long-read sequencing. And you can see at that point, we had some samples that had a very low sample volume, down to 5 microliters, and some samples with really low concentration of 6.8 nanograms per microliter. Overall, the input DNA ranged from 120-606 nanograms. And Greg showed you this read distribution length plot before, but here it is again, just to emphasize the really significantly long reads that we got from this run, the average being around 12,000 base pairs in length and some that extended to 30,000 and even 50,000 base pairs long.

Briefly, I wanted to go through the bioinformatics methods that we applied to these reads, both for the long-reads and the short-reads. For the read QC and QA, we applied FastQC and MultiQC. For read trimming and filtering, we applied fastp. Assembly was the only tool that was applied differently from the long-reads to the short-reads due to the difference in their read lengths. We applied Flye for the long-read assemblies and Unicycler with SPAdes to the short-reads. We annotated these genomes with Prokka. We performed AMR gene identification using AMRFinder, virulence factor gene identification with ABRicate, multiple sequence alignment with SibeliaZ, and phylogenomic tree construction with IQ-TREE. And I want to emphasize that all of these steps can be performed automatically in our P3 cloud-based platform for incoming sequence reads. Here's a demonstration of the mean quality scores for those reads.

On the left, you see our long-reads. I want to emphasize the difference in x-axis length. You can see the reads range from 0- 50,000 base pairs in length on the x-axis on the left, compared to the much shorter reads on the right that range from 0- 150. You can see that the Phred scores were really quite high for both technologies. But for the long-read technologies, you can see that they started out near 40 and only started to have meaningful drop-off after 25,000 base pairs in length. And even in that case, rarely went below Phred scores of 30. Here's a comparison of the coverage between those genomes, between those assembled genomes. You can see in blue, the short-read coverage, the average for those was 262x coverage. In red, you see the PacBio coverage. The average there was about 182x coverage.

All of these genomes had really high coverage. We got a lot of reads back for these sequencing files, and we had really high coverage that passed any kind of coverage restrictions we might have had. I want to emphasize that all the data for this coverage comparison can be exported from our P3 platform for further analysis and comparison, like is done here. On the assembly quality side of things, after assembling these genomes, we compared the number of contigs that were produced. You can see that for the short-read assemblies, the number of contigs, the average number of contigs was 167, whereas for our long-reads, the average number of contigs was 3. It can be difficult to see those long-read contig counts there because they're so few. The average, again, was 3 ± 1.

And so that's indicative of having one really high-quality contig representative of the chromosome and then maybe a few plasmids included in those assemblies. On the bottom, you can see N50 scores . N50 scores are a proxy for assembly quality, with higher N50 scores being better. You can see that the PacBio long-read technologies produced N50 scores of around 5 million base pairs, which are basically covering the entire assembled genome, whereas the short-read technologies produce N50 scores with an average of about 200,000 base pairs. Again, those can be hard to see in the bottom there, but they are there. So what does this mean for your final assemblies? We constructed Bandage assembly graphs to visualize the impact of the different sequencing technologies. On the left, you can see the result of an assembly graph for our long-read contig assembly.

You can see a really nice closed assembly for that chromosome there, along with a plasmid next to it. On the right, you see the contig assembly graph for short-read assembled genomes. And you can see that these are much more complicated, with several different branching segments to the network and a couple other segments on the bottom there that were not possible to integrate into the final assembly. Here's another example. On the left, long-read assemblies, we have a chromosome with two plasmids. On the right, again, much more complexity. And finally, another example with nice closed contigs on the left and a complex assembly graph on the right. And what are the impacts of these differences in assembly quality? One impact is for comparing genomes between strains and the species. And so we performed multiple sequence alignment using both the long-read and the short-read technologies.

At the top here, you see the result from the long-reads. These long-reads really enabled detection of large-scale genomic rearrangements between bacteria, such as inversions, duplications, and translocations. The results for the short-reads allowed us to still compare these synthetic blocks. However, the relative position and the orientations of the genomes, of the contigs in the genomes, is lost. This has impacts on things like phylogenomics, where we're comparing strains to each other for applications in epidemiology. You can see here, we looked at bootstrapping, which is a method applied to a phylogenetic tree to determine the confidence in that calculated tree. The tree calculated from long-read assemblies generally have branches with much higher confidence levels, as you can see on the figure on the right. The long-reads are in blue in this case.

Greg Young

Senior Product Manager, PacBio

Thanks, Jeremy, for the introduction. I'd like to thank everybody for attending today's webinar. I'm going to start off by talking about HiFi sequencing, talk about.

Jonathan Monk

Co-Founder and Head of Bioinformatics and Analytics, Pomona Pathogenomics

About phylogenetic tree to determine the calculated tree. The tree calculated from long-read assemblies generally have branches with much higher confidence levels, as you can see on the figure on the right. The long-reads are in blue in this case, and the short-read bootstrap values are in red. We can actually overlay those bootstrapping values to the tree itself. So as you see, the blue values are higher bootstrap blue, and black values are higher bootstrap values. On the left, we have our long-read assembled tree. The regions of poor bootstrap values are much more localized to the ends of the tree, whereas on the right, we have large clades with poor bootstrap values. This can have impact on interpreting a phylogenetic tree and the similarities between strains in a species.

So next, I want to start our demo of the P3 platform that allows you to perform a lot of the analyses that I just went through. P3 is a platform to deliver clarity to complex data in three easy steps. Users can upload their genomics files. They can run bioinformatics and AI-based analysis platforms in the cloud. And then they can experience dynamic visualizations and insights from that data. And the goal of P3 is to streamline the lab workflow and pipelines in real time, making identifying complex infectious diseases efficient and actionable. So I'm going to start the demo now. OK, so here, I'm logged into the Portal P3 platform. And I'm welcomed with an analysis or a display of my recent workflows. And I'm going to start assuming that I had new data to load to the platform.

I can go to the data module on the top here. I can create samples. I can browse a sample sheet here. I can upload a CSV or Excel file that has a description of all of my samples in it. So in this case, I have samples on the left. I have the file name from the sequencer that comes back for those. Then I have associated information for those samples, including the source they were collected from, the collection latitude and longitude. I can also then drag and drop some of my sequencing files for those samples and have them populated in the system. You can see that these red triangles change from red to a green check mark when the files have been uploaded to the system and staged for upload.

Next, I'm going to show you how to take these samples and run a workflow on them. I can click on the Workflows tab and click on Select a New Workflow. You can see here, we have a PacBio assembly annotation for bacterial whole genome sequences from PacBio long-read sequencing. I can click this button here to run that workflow and then visualize the results with the Visualization tab. I'm not going to run this workflow now because it's already been processed. I'll demonstrate the results in the Visualization tab. I do want to emphasize that we have various different workflows in place here that can take different sequencing tools, including Illumina and Oxford Nanopore. The demo I'll show today is for PacBio long-read sequencing results. If I click on the Visualization tab, I can see the output of a particular workflow.

Here, I'm presented with the number of strains that were processed, the number of gene families detected in those strains, as well as which of those were particular virulence factors and antibiotic resistance genes. If I'm curious about the quality of that run, I can click on the MultiQC tab on the left. I can scroll down to see some general statistics about the quality of the run. So, like I presented before, the N50 scores for these PacBio assembled genomes are quite high, around the 5 million base pair in length scores. I can also switch these long-read assemblies to my short-read results. You can see that reloads the page. In this case, I'm seeing short-read results that are N50 scores with around 2,000- 300,000 base pairs in length.

I can even scroll down and look at the results in terms of the number of contigs. Again, these are results I showed earlier in the presentation. I just wanted to demo them in the platform. You can see for the short-reads, the number of contigs is much higher than it was for our long-reads. I can also use the platform to look at individual strains. I can go to my table here and look for a particular strain of interest. I'll look at AEC24 right now. This page will load with the specific AMR genes detected for that strain. Here on the left, you can see there are 18 different AMR genes, virulence factors, as well as some different phenotypes and other information about the strain. In this case, I'm looking at the short-read assembled genome.

I know that because it's fragmented into several different contigs. You can see that it's broken up into all these contigs. I can zoom in and look at where some of these AMR genes were detected. You can see, AMR genes are displayed in red, while virulence factors are displayed in that orange color. I can see that some of these genomes have AMR genes. However, they're often on some of these small contigs, which can be hard to determine whether or not these genes are plasmidic or chromosomal. If I reload this page to display the long-read results, this page will now update. You can see the number of AMR genes detected, virulence factors detected is the same. My genome assembly is much higher quality, where I have a single chromosome here in the light gray.

And then I have the indication of a plasmid here in this dark gray. And if I zoom in, I can actually look and see that several of these AMR genes and virulence factors were actually detected on that plasmid. That was really difficult to do with my short-read results. And I can zoom in. I can see some of these virulence factors. For example, here, these are the IUC operon, which is involved in iron transport. Those are found on this plasmid here. And I can also see that the beta-lactamase itself that is causing the strain to be an ESBL strain is found on that plasmid here with that CTX-M-55 gene. Let's just take a look at one more example. I can, again, look at AEC14 in this case. This was a nice example because it had an example of a duplication. I can zoom in.

Again, another—it looks like a single chromosome and a nice single plasmidic contig here. I can zoom in and see a lot of these AMR genes on this particular plasmid. And I see my CTX-M-14 gene here on the plasmid. And if I scroll over, it was also detected on the chromosome. So this is an example of a duplication that might not be detected with short-read sequencing technology. I'll go through a couple other features of the platform. Because all the data is input into a database, I can actually search it. And so I can look for strains of interest that, let's say, came from a particular source, like the bloodstream, and may have a particular AMR gene of interest, like, let's say, genes affiliated with cephalosporin resistance.

I can click on a gene of interest, submit a query, and see a list of strains that had both that AMR gene and came from the bloodstream. I can also look at various different representations of all the different resistance genes found in my strains. So if I were to click here and look at another CMY, I can click on CMY-2 here and see that that gene was found in seven different strains. I can also look at the representation of these genes in a heat map format, where I can go here, pick an AMR gene of interest or category of interest, like cephalosporin resistance, and see these genes displayed in the heat map, where presence is indicated by orange and absence is indicated by black.

And if I was interested in following up on any one of these strains, I could actually click on it and follow it to the strain page there and look at the presence of that gene on that genome. I can also filter this heat map, again, by source. So let's say I wanted to know which of these strains were from the urinary tract or, for example, the bloodstream. And these are dynamically filtered to show which strains have particular AMR genes and are from particular sources. Finally, I also want to demonstrate our phylogenetic tree construction. Again, for our long-read assembled genomes, this is the PacBio assembled tree. I can actually overlay information like the source where these strains were isolated from. And I can see the bootstrapping values.

Here, in this case, the blue values indicate higher bootstrapping values indicating higher confidence in this tree, whereas the red values are some of the lower confidence values. I can also see where these strains came from, from their various different isolation sources. I can also filter these here by, for example, name. Let's say I only want to see the strains that started with a 2. Those are dynamically filtered here. That concludes my demo of the P3 platform. I hope I've demonstrated to you how P3 delivers clarity to complex data in three easy steps. The last thing I want to show is a comparison of these AMR gene identified between the two sequencing technologies. On the left, I have a heat map of AMR genes that were detected in our long-read genome assemblies.

On the right, I have the same heat map. But these are for AMR genes detected in our short-read genome assemblies. As you can see on the left, the number of AMR genes found in the long-read assemblies was higher. There were 1,813 AMR genes found across these 96 strains compared to 1,766 AMR genes found in the short-read assemblies. And these were primarily in the case of duplications. So you see on the left, there are some dots there that are darker blue, indicating that there were multiple copies of some of these AMR genes, which I also showed you in the P3 platform. But you might be wondering, what's the impact of the difference in AMR gene calling between these two technologies? And here at Pathogenomix , we actually build machine learning-based models to predict the phenotypic resistance of strains based on their genomes.

We did find two cases where these differences in AMR gene calls led to differences in those phenotypic predictions. One of the cases was for the strain AEC1, where the long-read-assembled genome predicted the strain to be resistant to ceftazidime, a third-generation beta-lactam, while the short-read genome was predicted to be susceptible due to the lack of a CTX-M-15 gene in that genome. Similarly, in the AEC66 genome, our long-read-assembled genome was predicted to be resistant to streptomycin, this aminoglycoside. The short-read genome was actually predicted to be susceptible due to a missing aminoglycoside phosphotransferase. Those are two clear examples of where the predictive results were different based on the sequencing technologies and the genes detected using those two sequencing technologies.

So in conclusion, I hope I've demonstrated to you that PacBio HiFi Prep Kit 96 produces high-coverage genomes even from low-input samples without normalization. Those long-read PacBio genome assemblies are significantly more complete than short-read assemblies. The PacBio genomes assembled with long-read technology identified gene duplications and instances of genomic rearrangement that are messy or impossible to detect with short-read technology. Epidemiology applications are improved from PacBio long-read genome assemblies with more robust phylogeny results. The PacBio and short-read assembled genomes mostly align on AMR gene identification with some key distinctions that have downstream implications on phenotype prediction. And finally, that the P3 tool is a powerful way to analyze and visualize genomics data from both short-read and long-read assemblies. We'll be presenting these results at a couple of upcoming conferences, including APHL in May, as well as ASM Microbe in June. And with that, thank you all for attending.

Please visit our website, pathogenomix.com, to learn more.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Thanks. Thanks both for the great talks. Now we're going to move into the live Q&A session with both of the speakers. If you haven't already, please submit your questions in the area provided in your attendee control panel. All right. So we do have a few questions already that have been submitted. So thank you for that. First question that I will assign to Greg. What limits the sample number to 384 per cell?

Greg Young

Senior Product Manager, PacBio

Yeah. So some of that's just a function of the number of indexes that we currently provide. So just last month, we launched 3 new SMRTbell adapter index plates. Each of those plates has 96 adapter indexes. So now we have a total of 384 adapter indexes. Now, with the throughput of the Revio system, as I highlighted in my portion of the talk, with a 384 Plex, you generally get around somewhere probably between 15-30x coverage per microbial genome. And that's sort of where you want to be at to be able to get an assembly from HiFi data. If you were to increase that Plex level on a per-cell basis, you could still get a lot of useful information. But then your coverages are going to drop below that if you're trying to do assemblies. So we generally recommend keeping it at that 384.

But with the power of the Revio, being able to do those 4 cells in parallel, you can actually probably do if you want to increase, you just do multiple cells to run in parallel. Because at that point, the majority of your cost for that experiment is actually on the sample prep side and not the sequencing side. So yeah.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Great. Thank you. Another question for you, Greg. Is it OK to combine samples from different species on one cell?

Greg Young

Senior Product Manager, PacBio

Yeah. Yeah. Yeah. Totally. And that's sort of what we've done here. I think the only thing to kind of keep in mind is generally recommend if you're doing microbial whole genome sequencing like we've talked about today, make sure you just pull microbes together, right? You don't want to necessarily pull a sample that may have a very large genome with a bunch of samples that have very small genomes because then it's going to just really throw off that balance you're trying to get because you want to make sure you're getting enough coverage on each of those. So you'll probably get a lot more coverage on the short stuff and maybe not as much coverage on the larger genome. So you just want to kind of keep that in mind when you're thinking about your pooling strategy.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Awesome. Yeah. Also, the data set that was presented today, there are multiple species too. So another question for you, Greg. Is it possible to use the HiFi Plex Prep Kit with Sequel II? If so, what are the recommended DNA inputs for that and for multiplexing? What is the expected performance on Sequel II?

Greg Young

Senior Product Manager, PacBio

Yeah. I should have mentioned that in the talk. But yeah. So you could definitely use this kit for Sequel II. So just in that ABC step, you would just bind with the sequencing polymerase for the Sequel II instrument. There, we recommend doing 96 Plex as sort of the maximum, given that the Sequel II instrument doesn't quite have the higher throughput as the Revio system. And the expected results should be very similar to what you get on the Revio. The only difference would be just the lower throughput, right? You're just not going to get as high of a gigabase yield on a Sequel II system that you'll get on the Revio system. But the performance of the data should be pretty much identical.

It's just that, yeah, your capability in Sequel II is probably pretty maxed out at 96 per SMRT cell, whereas the Revio enables you to scale that up a little higher.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Great. Thank you. Another question for you as well. So, a lot of great questions about the library preps, which is awesome. What is the expected or average recovery percentage after library prep using the Plex Prep?

Greg Young

Senior Product Manager, PacBio

Yeah. So it's going to vary a little bit depending on the quality of the samples put in. But it's generally between 10%-20%. And that's on an individual and pool basis. So generally, that provides you, since we're pulling everything going in, plenty of material to run one or multiple SMRT cells. But yeah, it's somewhere around 20%, I would say, on average.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Thank you. How many different samples can you process at once using these kits?

Greg Young

Senior Product Manager, PacBio

So we generally recommend doing 24 samples at once for each time because that's sort of how we configured the kit to provide enough reagents to do sets of 24, or you could do 48 or 96 or something in between. It doesn't have to be a multiple of 24. We just don't recommend going below 24, or you won't have enough reagents to possibly fill out the whole 96 that you purchased the kit for. And some of this has to do around with expectations around automation, right? This workflow should be easy to automate. I mean, we have a qualified program on the Hamilton NGS STAR system. But you could take those workflow steps. There's nothing preventing you from doing those workflow steps on another instrument if you had the resources to work on developing that on your particular platform, right?

And so the dead volume we provide is enough to do 4 runs on an automated liquid handler. So that's why we say you should do things at a minimum of 24 samples at a time.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Yep. Just looking through here. We're getting a lot that are coming in now. What's the lowest possible nanograms loaded onto the Revio for 30 KB library? So it's probably 50 nanograms, right? Is the answer?

Greg Young

Senior Product Manager, PacBio

Well, we recommend 50 nanograms to guarantee it. But maybe I'll kick it over to Jonathan because we did have a sample on this in the data set he provided us that we thought, oh, he was like, "Oh, there's a very low concentration. There's hardly anything in there. And it may drop out." And we're like, "OK, we'll just give it a shot anyway." And we prepped it. And I believe we got a few thousand reads for it. And you were able to get a lot of data from it, right, Jonathan?

Jonathan Monk

Co-Founder and Head of Bioinformatics and Analytics, Pomona Pathogenomics

Absolutely. Yeah. We have pretty nice coverage, actually, even for that low-input concentration, low-input genomic content sample. Yeah.

Greg Young

Senior Product Manager, PacBio

Yeah. I think when we measured the concentration, I mean, there was only. It was definitely below 50 nanograms. It was only 50 nanograms of actual DNA there. Yeah. Yeah. It is possible. Yeah. It also depends on in this situation, since we only had a 96 Plex, right, we had sort of we were sort of oversampling. We had a lot of high coverage, right? So in that situation, you're more likely to detect that lower abundance species than if, say, if we did a 384 Plex, maybe we don't get quite as high coverage on that low-input sample unless we kind of normalized everything down to that level, which we probably wouldn't really recommend doing. Yeah. Generally, I would say about 50 should be where you sort of if you have the DNA, you should cap it at 50 to guarantee that you get everything.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

The next one is the plot that you showed of the sheared DNA seemed to have two major peaks. What could be the reason for having those two peaks of sheared products?

Greg Young

Senior Product Manager, PacBio

I think they're probably referring to the HiFi read length distribution. It had a little bit of a hump or shoulder to it. So that is just a product of the fact that we're doing 96 samples, right? So not all the samples are coming in with the same level of quality, right? So some of them are going to have a little more degraded DNA than others, right? So the others that have really good DNA, we're going to be able to shear and keep them at somewhere around 15 KB size. But the ones that are already degraded already have massively degraded DNA. They may already be below that. And that's just simply what we're seeing in the sequencing data, right? We didn't get everything perfectly within where every single fragment is between 15-20 KB, right? It's more of a spread.

That's simply just what we're seeing in the data. It's just a fact of just pulling 96 different samples together.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Yeah. All right. Thank you. Is it possible to use in your kits for oomycete WGS?

Greg Young

Senior Product Manager, PacBio

Yeah. So I'm sorry. I'm not too familiar with that. Is that a fungal species? But yeah. I mean, generally, anything that has sort of a small genome size here where you can multiplex 24 or higher on a Revio SMRT cell. And now, just for reference, I did have that one slide in the presentation that showed you sort of what the gigabase output of a Revio SMRT cell would be. So if we're talking about a 15 KB insert that's 90 gigs, so basically, you take that 90 gig mark, divide it by your 24. And if that gives you enough coverage on your genomes, yes, you can sequence it easily using this kit and using either the Revio platform. Or you could do the same exercise with a Sequel II platform if you have access to that particular instrument.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Great. Thank you. So another question. We use fusion primers with custom index for Amplicon library prep and use kit-provided adapters to make a SMRTbell. But there's no adapter provided in the HiFi Prep in the plex kits, which makes it a bit difficult. Do you have any comment on that?

Greg Young

Senior Product Manager, PacBio

Yeah. So we do recommend using our index adapters. And that'll just simplify the process of doing the on-instrument demultiplexing because we've sort of validated that. We know there's not going to be any possible crosstalk with those barcodes or misidentification. So you can still do that with this kit, right? So I said we're still talking about a $39 per sample cost, which is going to be your lowest option with any of the library prep kits we provide. So the fact that you do have your own index or barcode there on the sequences, it is still possible you can just basically dual index something, right? You can demultiplex it with ours and then do another run of demultiplexing with your barcode sequence.

Or, once you get the data and you have your own bioinformatic pipeline, you could just simply then use your index if you want. But having the PacBio index is not going to harm anything. And it's not going to make you have to change anything about the sequencing experiment itself.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Thank you. Is it possible to do the pipette DNA shearing manually without using Hamilton instruments?

Greg Young

Senior Product Manager, PacBio

Yeah. Unfortunately, you can't. So the way this method works is it's a very fast up-and-down mixing step at certain accelerations and decelerations and a certain volume. And it's just not possible to kind of reach those speeds or doing this manually. So it is a great technique. But it is currently only done on the Hamilton instruments because they have that sort of parameter matrix of being able to pipette mixed-dimensional microliters of volume, do it at the speeds that get this nice profile that we showed you, right? But the good news is that Microlab PREP system is relatively inexpensive in terms of laboratory equipment, right? So it is not that much more than purchasing, say, a Megaruptor 3 system or some other kind of DNA shearing technology. So I would really encourage people to look into getting that technology. We can sell it.

PacBio can sell it to you. Or Hamilton can sell it to you. So yeah, I encourage people to check that out because, yeah, it's not an expensive liquid handler. And it can do other things like our Short Read Eliminator Kit , which we recommend for doing larger genomes. You can automate that step on the Microlab PREP system as well.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Great. Thank you. A question that I can answer here. Will the slides be available to us after this webinar? We will be sharing the recording. If you do want the slides, you can reach out. And we can share those, though. But we won't be sharing those in the standard process. But you can reach out. And we're happy to share them. Back onto the loading question for the 30 KB library. Would it be 50 nanograms for the entire pool of 96? Or is that per sample in a 96 plex?

Greg Young

Senior Product Manager, PacBio

It's per sample. Yeah. So per sample, you do that. Yeah. So then again, yeah, it's per sample.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Then the total loading would need to be what, at least a microgram or 2 micrograms?

Greg Young

Senior Product Manager, PacBio

Oh, to load on the instrument? No. It all depends on the molarity, right? So yeah, we'd have to since you have a large if you do have a 30 KB insert, then obviously, the amount of nanogram material you need to reach the molarity that we recommend putting on the SMRT Cell would be a little higher. So we recommend maybe 200 picomolar as the molarity to put on the instrument. So I don't know what the off the top of my head is. But I would imagine it's probably only around 300 nanograms or something like that or somewhere between 300-400 for a 30 KB insert. And that's generally really large. Most of our libraries, it only takes somewhere between 100-200 nanograms of material to load at 200 picomolar depending on what that insert size is.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Thank you. Another question. Can HiFi be applied to bacteriophages? The answer is yes. And there's been a lot of published work on that as well. So yes, you can use it for bacteriophages. That's all the questions we have in the chat so far. If there are any other questions, please submit them now. If not, we can end a little bit early. I can go ahead and then close it out here. I'd like to thank the speakers for their time today and for sharing their expertise and experience. Just as a reminder, immediately following closing out of this webinar, you'll receive a short survey. Please do fill that out as it helps us understand your needs for future webinars. We did have a question that just came in. We'll go ahead and answer that then.

I just wanted to make sure I got that last part there. So what's the average polymerase read length for this data set or in general?

Greg Young

Senior Product Manager, PacBio

Yeah. So there's two ways of looking at that. We have our HiFi read length, which obviously is that consensus. And I showed you those primary metrics in a portion of my talk. And if you wanted to average, it was 11.99 KB, just approximately 12 KB. The N50 was closer to 14 KB. But the actual polymerase, the amount of times it goes around, that's generally 90 KB or 100 KB depending on the Revio system and a little higher Sequel II system. Yeah.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Awesome. Thank you. Another question. Is the shearing a mandatory part of the high-throughput protocol?

Greg Young

Senior Product Manager, PacBio

Actually, it's not mandatory. It depends, right? So a lot of times, especially if you're working with metagenomic samples or maybe, say, a Gram-positive species, the extraction tends to be a little more a little rougher, right? You may have to do bead-beating or something to kind of really lyse those samples. So in that case, a lot of times, the DNA already comes out pretty much sheared around 10 KB. If that's the case, you can just go straight into the protocol with that. If you don't, say, have a Hamilton technology available, there's other technologies that we have tech notes for and protocols for. There's this company, MP Biomedicals, that has this FastPrep system. It's basically a big tissue lyser. It just shakes a plate of samples very fast. But what you can do is just load your DNA on the plate, no beads or anything.

It'll just shake it. That's actually enough to shear the DNA into the size ranges that we need. There's another company, the Geno/Grinder, that will do the same thing. So I'd recommend checking out those if you're kind of looking for an alternative high-throughput shearing solution that's also low cost. But generally, too, it could also be as part of the extraction, right? If you're doing bead-beating type of extractions, most likely, your DNA is already going to come out pretty well sheared.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Great. Thank you. Here's a question for Jonathan. So how do we go about utilizing the Pathogenomix platform? What's the best approach for that?

Jonathan Monk

Co-Founder and Head of Bioinformatics and Analytics, Pomona Pathogenomics

Yeah. So the best approach is just once you have raw reads coming back from any kind of sequencing run, so for instance, coming off of a PacBio sequencer or other companies, you can upload those reads directly to our platform. And as I showed in the demo, you can actually then pick different workflows to run on those reads. So let's say you want to do an assembly and annotation bioinformatics workflow. You can do that all via the system via our web browser and run through that. And then those results are run in the background in the cloud. It's all cloud-based. And so they can be run in parallel. And as soon as those analyses are completed, you're able to then view and visualize those results. And as I showed too, we have a number of different dynamic, nice visualization tools. So these aren't static visualizations.

You're getting back rich, dynamic visualizations that allow you to filter through your results and query those results, et cetera. And so we're continually adding new bioinformatics workflows, so depending on your study of interest. But we do have several workflows for assembly, annotation. We have that for bacterial pathogens, viral pathogens, et cetera. And we're continually adding those every day. So you can sign up to join the Pathogenomix platform. Please visit our website where there's more information on subscribing and becoming a user of the platform. And if you have workflows in mind that aren't existing in the platform, we can work with you to implement those. And otherwise, you can also just use several of the workflows we have there.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Great. Thank you. Greg, a follow-up question to the shearing question. To clarify, if the samples are long fragments and they're not sheared at all, will it work with the high-throughput protocol?

Greg Young

Senior Product Manager, PacBio

Yeah. It depends how long we're talking about. Generally, to get a HiFi read, we want to kind of keep it under 30 KB. If it's longer than 30 KB, your yield is probably not going to be very high because the odds of getting those multiple passes to do that consensus sequence generation, it's just going to be lower. So we generally recommend keeping it below 30 KB. But by long fragment, you mean under 30 KB? Then yes, it's possible. But yeah, generally, we like to keep it on that target.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Thanks. Another question. Can you comment on GC bias, if any, for the HiFi prep kits and how it compares to other sequencing technology?

Greg Young

Senior Product Manager, PacBio

Yeah. I'll briefly mention this. Then I'll let maybe Jonathan talk about what he saw in the data. So yeah, no, there shouldn't be any GC biases. We're not doing any amplification here. So there shouldn't be any GC bias. And Jonathan, I don't know. What did you see in the data? I don't think we saw any of that, right?

Jonathan Monk

Co-Founder and Head of Bioinformatics and Analytics, Pomona Pathogenomics

No. I didn't notice any GC bias. Again, the results there were consistent with the short-read technologies as far as GC percentages. Didn't notice any bias there. Yeah.

Jeremy Wilkinson

Segment Lead for Microbial Genomics, PacBio

Awesome. Thank you. All right. It looks like the queue is clear. So I think we'll go ahead and end with only 2 minutes left here. So thanks, everyone, for joining us today. I hope you'll join us again in a future webinar. With that, take care and have a great day.

Greg Young

Senior Product Manager, PacBio

Yeah. Bye, everybody. Thank you.

Jonathan Monk

Co-Founder and Head of Bioinformatics and Analytics, Pomona Pathogenomics

Bye. Thank you.