Status Update

Aug 26, 2021

Hello everyone. Thank you for joining us and welcome to the Global Neurodegenerative Disease Summit. Today's presentation will feature a talk by Doctor. Ryan Courses focused on the use of single cell epigenomics to reveal causal non coding variants in neurodegenerative disease. My name is Kelly Miller. I'm with 10x and I'll be moderating today's session. We have just one housekeeping slide before getting started. We do want to make our presentations as interactive as possible. This particular presentation will not include a live Q and A session, but we encourage you to submit your questions in the Q and A box, adjacent to the slide window and we'll follow-up with you on a later date. Also, you can find a list of resources related to this week's topics in the resource list link on the right of your screen. Please note that all attendees are on mute. And also, this webinar is being recorded. We'll send you an email when the on demand webinar recording is available for viewing. Just a reminder that today's presentation is part of a series. We have an impressive agenda this week that features discussions of some of the most impactful recent work and cutting edge applications in the neurodegenerative disease field. So I hope you'll make it to hear all the important talks that will be presented and I thank all of our amazing speakers in advance. Now I'd like to introduce our speaker for today. Doctor. Corcis started his scientific career at Princeton University where he graduated with a focus in molecular and computer science. Ryan's thesis work at Stanford University in Doctor. Ravi Majeti's lab centered on the genetic evolution of acute myeloid leukemia where he showed that the earliest mutations that in AML affect genes that regulate the epigenome. Following this line of study, Ryan began his postdoc work in the laboratories of Doctor. Howard Chang and Tom Montene studying epigenetics in human diseases including neurodegeneration and cancer. Ryan joined the Gladstone Institute of Neurological Disease in July 2020 to study the contributions of genetic and non genetic factors to neurodegenerative diseases. Using computational biology, large scale screens, and single cell technologies, Ryan's lab probes the epigenome of patient derived cells with the aim of understanding the impact on disease risk and developing novel avenues for therapeutic interventions. So thank you so much for being with us, Ryan, and I'll go ahead and turn it over so you can get started. Thanks so much for that introduction, and thank you all for being here and listening in to some of the work that my lab has done and has ongoing in the space of single cell epigenomics, with a particular focus on using single cell epigenomics to annotate the function of non coding variants in neurodegenerative disease. My name is Ryan Courses, and I'm an assistant investigator at the Gladstone Institute, I just started my lab in the 2020. And so it's really a pleasure to tell you about our pursuit to understand this puzzle of the nucleus and how the epigenome affects disease. So my lab is broadly focused on this question of why do some people develop neurodegenerative disease while others do not? And we focus on this through a few kind of overarching themes. The one that I'll focus on today is the contribution of inherited genetics to the disease. But we're also very focused on epigenomic aspects of the disease, which we view as cognitive resilience, you know, the propensity of some individuals to remain resilient to cognitive decline, or selective vulnerability, which would be the propensity of certain neurons to decline with these diseases. We're also very focused on developing new technologies or modifying existing technologies and developing new software and analytical paradigms, and really the overall goal of all of these studies is to create better prognostication strategies or new therapeutic targets. So with that as kind of the background for what my lab is interested in, we'll dive into this inherited genetics. So how much do we really understand about the inherited genetics of late onset Alzheimer's disease? Well, if you look at twin studies, there is approximately sixty percent of Alzheimer's disease is heritable and genetically so so the vast majority of Alzheimer's can be explained by inherited genetics. Of course, this is somewhat confounded in that twin studies, the twins often share early environments, so that does obviously play a role as well. But, as an upper bound, we could consider about sixty percent of genetic heritability in all strains. If we just look at common SNPs, those above some threshold in the population, about thirty three percent of the phenotypic variants in Alzheimer's disease can be explained, and a large portion of this is explained by the sole effect of APOE, which is a gene that harbors two coding polymorphisms, and is a major risk factor for Alzheimer's disease. So there's a large portion of the endemic variants that we can't explain by the known genetics of Alzheimer's. And so the question becomes, where is this missing heritability? Well, exists in a couple of different places. It could be that there's less common SNPs that are affecting this, there could be structural variations, which are a very interesting topic these days. And the vast majority of this is going to reside in the non coding genome. And I'll give you a little bit of background on why I make that argument. So, a genome wide association study typically results in a plot like this. This is a Manhattan plot where the significance of association of a given variant with the disease is shown on the y axis, and, the position along, the linear genome on the x axis. And any regions that fall above this red dotted line are significant. And so, of course, we see regions such as the APOE gene or BIN1, which are very well, now studied, but there's many, many genes down here which are annotated as being associated with Alzheimer's, but we don't really understand their function. And in large part, that's because those associations are driven by variants in the non coding genome. And so just to really, drive this point home, this table represents what you might get out of a genome wide association study, where you have a list of SNPs, and they're annotated because they have the highest p value, but there are often nearby SNPs which also have very significant p values. And then with each of those SNPs is an annotated gene, and that's often annotated because it's the nearest gene. But in the case of non coding polymorphisms, it's very well understood that non coding gene regulatory interactions can occur over very large distances, and so the nearest gene may or may not be the correct functionally relevant gene. So to summarize this all in words, genome wide association studies, they're very good at identifying large regions of the genome where genetic variation is associated with a particular disease. But what they're not good at doing is determining which cell type is affected, they're not good at predicting which genes will be affected, especially in the case of non coding variants, and they're really, really not good at pinpointing which SNP is functional. And this is largely due to linkage disequilibrium, which if you remember back to your undergrad genetics class, is is really the propensity of SNPs that are located nearby to be co inherited together. And so I'll show you some data throughout the next thirty minutes or so where we kind of tick off these three points using single cell ATAC seq, using Hi ChIP and single cell ATAC seq, so three-dimensional chromosome confirmation capture techniques, and then also using machine learning to kind of tie all of this together. So within Alzheimer's and Parkinson's disease, and this largely extends to every disease that's been studied, the vast majority of genome wide association study polymorphisms reside within the non coding genome. And most loci have no plausible coding, alteration that would explain their association with the disease. And so we are charged then with understanding how these noncoding polymorphisms can possibly be functional. And so the underlying hypothesis is that to be associated with the disease, a SNP has to exert an effect, otherwise it couldn't be associated. And so for a noncoding SNP, that means that it has to affect gene expression or splicing, because it's not going to affect the protein sequence. And so how does a sequence change in the noncoding genome affect gene expression? Well, let's take this toy example where you have a T allele and a C allele. Here, I'm highlighting a data transcription factor motif, so you can see that motif here. And, you know, wouldn't it be great if we could identify, the regions in the genome where a transcription factor was found or where a gene regulatory element was present so that we could determine whether or not this sequence change affected transcription factor binding and gene regulation. So what we would want is to find those sites, we could map them to nearby genes, and of course the way that we do this is using chromatin accessibility profiling. In my lab we use ATAXE, and so we find peaks of chromatin accessibility where transcription factors are bound and gene regulatory elements exist, and then we look under those peaks for sequence changes that may affect canonical transcription factor binding sites. And so this is the type of SNP that we're looking for in the context of a functional noncoding SNP. One of the ways that we identify these is through this concept of allelic accessibility. And in this toy example, this GATA motif is only present on the T allele. When you have the C allele present, it disrupts that motif, and so the transcription factor, GATA transcription factor, only binds to this particular allele, and so you see that allele overrepresented in your sequencing data, and this will come back into play later. So we started this whole journey quite a few years ago now, by doing bulk ATAC seq, and we took samples from controls and Alzheimer's disease and Parkinson's disease individuals and profiled bulk ATAC seq in these, seven different regions. And of course, when we do dimensionality reduction on those samples, we can see that they generally group by the brain region of origin. And this is largely, it turns out, due to different cell types present in those different regions. So for example, in the striatal regions, see the dopamine D2 receptor. In the nigral regions, see markers of that part of the brain, in particular, IRX3 transcription factor, etcetera. But what we ended up finding was that these bulk assays showed very little significant difference between cases and controls. So, of course, if we compare different regions, we can see significant differences. But when we compare, for example, controls that have very low pathology to cognitively healthy individuals that have very high pathology, we see no significant differences. And I'm only highlighting this to create a foil for why single cell data is really important. But just to drive this point home, in the bulk ATAC Seq data, what we're missing is cell type specific signal. And we know that chromatin accessibility is extremely cell type specific because gene regulation is highly cell type specific. So to illustrate that point, here are, ATAC seq tracks of various different cell types, and and this is just around a random gene in the genome that I chose, IGF-one, and I hope that you can appreciate that effectively every single cell type here, even though IGF-one is expressed in most of these cells, they have very different ways in which they regulate the IGF-one gene. So you have excitatory neuron specific peaks, inhibitory neuron specific peaks, microglia, oligodendrocyte, astrocyte, etcetera. And so hopefully this shows you that cell type specificity is really important for gene regulation, and so the question becomes how do we obtain these cell type specific regulatory landscapes in the brain? Unlike in the blood, we don't have paradigms to fact sort all of these very intricately defined cell types, and so one of the most effective ways that we have found to do this is to use single cell profiling. And so this essentially needs no introduction in this webinar, but, the way that this works with the 10x Genomics platform is that we transpose in bulk, we use the chromium system to encapsulate individual nuclei with barcode HL beads, we do our amplification, split the gems, and then sequence all of the fragments that result and map each of those fragments back to the cell of origin based on the barcode that they have. And so in the context of the brain, rather than flow sorting upfront and doing bulk attack seek of different populations, we can now do this all in in one pot reaction and and then identify the neurons, the glia, etcetera, based on their cell type specific signals. So what does this look like? So here's a dimensionality reduction of about 70,000 single cells, and so we can call clusters and try to annotate those clusters. We do that in the chromatin accessibility space by making what we call gene activity scores, which are inferences of how highly expressed a gene might be based on its patterns of chromatin accessibility. And this works quite well, you can identify excitatory neurons, inhibitory neurons, microglia, oligos, astrocytes, OPCs, etcetera. To highlight how different this data is compared to the bulk ATAC seq data, if we take all of the peaks identified in hundreds of bulk ATAC seq samples from the brain and compare those to, peaks identified in just 10 single cell ATAC Seq reactions, we find almost twice as many peaks in the single cell ATAC Seq data. We do capture the vast majority of peaks from the bulk ATAC Seq data with our single cell ATAC but but a very large portion of the single cell ATAC Seq peaks are not captured by the bulk ATAC Seq peaks. So you can see this, in this heat map where of the 350,000 or so total peaks, approximately 220,000 are specific to one cell type or a pair of cell types. For example, these are specific to neurons, excitatory neurons, inhibitory neurons, microglia, etcetera. So this is more than half of the peaks are cell type specific. And so when you think back to the bulk data, and we look at which of these cell type specific peaks are captured by the bulk, you can see that there's a strong underrepresentation of peaks from microglia, astrocytes, and OPCs, which happen to be the least abundant cell types. And so what this ends up showing is that cell types that are less than about 20% of your total sample are just completely missed by bulk profiling. And and so doing this sort of single cell profiling really illuminates a lot of the cell type specific biology. To show you how far this can be taken, here is a a dimensionality reduction just of the neurons in our data, and we can identify really fine grained subclasses of neurons, including different subclasses of, interneurons like somatostatin, parvalvulin, or VIB. We can identify even multiple subtypes of medium spiny neurons in the basal ganglia and similar phenomena. So to kind of wrap up this section on why single cell compared to bulk, we took that single cell data and used it to deconvolve the bulk ATAC Seq data. So we do this using Cybersort, which is a program developed at Stanford by Aaron Newman and Ashleigh Zadeh. And so what this basically does is takes a bulk ATAC Seq profile and splits it up into profiles that represent contributions from different cell types. So, again, we do this using Cybersort. We get these sorts of signature matrices which define the cell types of interest, and we can see that when we, use this in the bulk data and compare to the known ground truth in the single cell ataxic data, it performs extremely well. So when I run this sort of analysis across all of the bulk data that we, obtained previously, you can see that there is a massive amount of heterogeneity in the cell type composition of these individual tissues. And so if you imagine trying to identify statistically significant differences across macrodissected frozen brain, hopefully you can appreciate how much variability they would have, across different samples and how problematic that would be for identifying statistically significant signals. Okay, so back to our plan to understand genome wide association studies. One of the first things that we did was try to see if there's a specific cell type that's enriched for polymorphisms from Alzheimer's or Parkinson's disease or other cell types. And it's very well known, at at this point in time that microglia, shown here in light blue, are enriched for these polymorphisms in Alzheimer's disease. This doesn't really tell us anything about a specific polymorphism but about the disease in general. If we look at different neuronal subtypes, none of them are enriched for polymorphisms that would be associated with Alzheimer's or Parkinson's disease. So this gets a little bit at the cell type level, but we still want to dig a little bit further and try to annotate which precise genes are being affected by each of the individual polymorphisms. So, for this, we've done HiCiP, which, is a chromosome confirmation capture technique, And the only real difference between Hi C and Hi C, which, Hi C would capture all interactions, Hi C uses an antibody enrichment to capture specific regions. So in this case, we're using the active chromatin marked h three k 27 acetylation, which will capture interactions between enhancers and promoters in other regions of the genome. The other way that we're going to map regulatory elements to the genes that they interact with is by using what's called co accessibility. And so imagine you have a promoter and many different enhancers, and you wanted to predict which of these enhancers might be affecting that gene's expression. You could look for situations in which the accessibility of the promoter was correlated with the accessibility of the enhancer. And hopefully, can appreciate that the e three enhancer is relatively well correlated with the accessibility at the promoter. You could plot this, in multiple different ways, but the end story is that one of these enhancers has a highly correlated accessibility with the accessibility of the promoter. So using these, two different orthogonal techniques, we can start to try to map these SNPs to potential genes. So if we just take all of the lead SNPs from GWAS studies and we ask how many genes do we map them to, well, we map them to the nearest gene. So it's one gene per SNP, and that's 51 at the time when the study was done. But if we use the high chip and co accessibility data, we can expand the realm of genes where these polymorphisms are mapping, and notably, about half of those predictions from the lead SNPs are actually incorrect. You see a very similar picture for Parkinson's disease, and so this, hopefully gives you the the feel that we can now map which genes might be affected by which SNPs. So the last piece of this puzzle is really predicting which SNP is functional. And then once we know that, we can map it to to genes and cell types and all of that. And the way that we're gonna do this is through machine learning. So, machine learning, what we're going to use it for is to predict this functionality. So so here's the paradigm that we'll First, we start with a lead SNP. We expand this in linkage's equilibrium to identify all of the SNPs that might be important. We then start to will that list down. So we overlap those with the peaks from our single cell chromatin accessibility profiling because our hypothesis is that for a SNP to be functional in the non coding genome, it has to affect a regulatory element, which would be highlighted by one of these peaks. So then for the subset of those SNPs that affect peaks, we're going to try to predict the effect of that sequence change on transcription factor binding. So in this toy example, you have a c to an a change here, and that that, might affect the binding. And once we do that, we'll use our high chip or co accessibility data to map that particular SNP to the genes that it it might be regular. So really the crux of this is to predict the SNP effects. So, we take all of our clusters, and we use a gapped k mer support vector machine, which essentially just is learning the underlying grammars of chromatin accessibility so that you can feed it a wild type and variant sequence and look for differences in how, those sequences may be found by a transcription factor. So here you have the effect allele with predicted higher accessibility indicated by a higher, height of these logos, the noneffect allele with the lower, and then you have this delta track, which should in theory highlight the motif, that is, present in that binding. Okay. So I'll walk you through a few of these examples. So first, the PICOLM locus. This has classically been associated with microglia, but I'll try to provide some evidence that it may actually be affecting oligodendrocytes. So, across a couple GWAS studies, we have a few lead SNPs. When we expand those in LD, we see about 165 SNPs and 24 of those overlap peak regions, which I'll show you in a second. These are all in the vicinity of the PICOLM gene, and that is why these SNPs have been annotated as affecting PICOLM in the past. So I'm going to convince you that it's this one SNP right here that has an effect. That SNP does overlap this prominent oligodendrocyte specific peak, and it does interact both based on high chip and co accessibility with the PICOLM gene. We also see some evidence for it interacting downstream with this gene EED, which is part of the polygon group, which is another provocative hypothesis. When we look at the machine learning, we see a pretty strong prediction that this, g to a change disrupts a phos enhancer, which is shown here. So, again, when you have the a allele, have very little predicted accessibility. When you have the g allele, you have higher predicted accessibility. And when you subtract those two tracks, you essentially pick up this beautiful, phos motif. Taking this a step further and going back to our bulk data and looking for allelic accessibility, you can see that in a large number of cases, all of which are heterozygotes here, and the ones shown in color are heterozygotes, the reference allele is more accessible than variant allele. So here, the non effect allele is the reference allele, and so the G is more accessible, more strongly bound by phos than the variant allele. And that is a very strong indication that this SNP is functional. Okay. So how about a few other examples? I'll try to breeze through these rather quickly. In in in this particular locus, which is annotated with the KCNIP3 locus, we expand an LD. We cover a pretty large region of this locus with about a 100 SNPs, and we can whittle those down to 22 that affect peak regions. And I'm going to convince you or show you evidence that it could be either, this SNP shown here in red or or this one over here. So they have two very different stories. This one on the left affects an oligodendrocyte specific peak. This one on the right affects a neuronal peak. The oligodendrocyte specific peak seems to interact with this NAL gene, which is a key gene for oligodendrocyte function. And this gene, this this SNP interacts with the KCNIP3 gene, which is known to be involved in neuronal function. On the oligodendrocyte side, we have support by a machine learning prediction where the effect allele is more accessible than the noneffect allele, and this maps very strongly to a SOX6 motif where SOX six is a known regulator of oligodendrocyte function. On the neuronal side, we have, enough individuals where we can find evidence of allelic accessibility. Unfortunately, for this, SNP, we did not have enough individuals that were heterozygous. So these two SNPs give two very different interpretations of what's going on. I'll give you one last example, and that's in a much better understood locus, which is Bin1. Here, these two SNPs could potentially have function. They both affect microglial specific peaks, and they both, with some evidence, interact with the promoter of BIN1. However, we believe that, this SNP, RS13025717, is the causative SNP because in our machine learning, predictions we find a very strong loss of accessibility, with the effect allele, and that maps very strongly to a KLF4 motif, which is a known regulator of micro allele identity. So this has all been published, we're actively working on validating some of these findings. And so how do we go about validating that one of these predictions from the machine learning side of things is actually, important and functional in the disease? So the first way that we're attempting to do this is with what we call starless single based editing, and the idea here is that you take a wild type, allele and you use some, gene editing approach, in our case we're using prime editing, to change that in an isogenic fashion from a G to an A, and then you differentiate these cells and test allelic differences in gene expression. We've been able to do this for a variety of loci, and I'm just showing you some Sanger traces here showing an a to a g conversion is the g allele, or a g to an a conversion. But I will say that prime editing has been a little bit finicky and is is quite locus dependent, so we're still actively working on a lot of this. The other way that we're using functional genomics to validate some of these findings is through massively parallel reporter assays. And so what these assays do is you have a wild type version and a variant version of a particular regulatory element, and you clone that upstream of a minimal promoter and an open reading frame, and and you use sequencing to determine the differential activity of these two alleles in a reporter assay. You do this across thousands of different allelic transcripts in in this assay. So this, of course, has the advantages of being very high throughput. You can do this in any cell type that you can query and grow in cell culture. And and this can really be applied to any disease, and it does give you a quantitative readout. The disadvantage here is that it's not in c two in the correct location in the genome, so the correct in c two context isn't maintained. But these are things that we're actively pursuing and hoping to have more to share in the future. I'll share one more vignette with you about how we can use this sort of epigenomic data to understand pretty complex associations, and in this case with Parkinson's disease. So MAPT, which is the gene that encodes the tau protein, is actually one of the strongest GWAS loci associated with Parkinson's disease, even though we canonically think of tau as an Alzheimer's disease, protein. And the MAPT locus is very interesting from a genetic and evolutionary standpoint. So, there's been many, many years ago there was an inversion that occurred in this locus, which, occurs between this location and this location, which swaps the orientation of everything inside the inversion with respect to those things outside of the inversion. And along with that inversion, there are a few thousand SNPs that are also different between these two haplotypes. So if you inherit a copy of the h two haplotype, this region is flipped, and you also have thousands of SNPs within that region. And so this creates a very complicated problem to understand what is it the inversion that's causing the disease association? Is it one of these SNPs? And how do we understand the epigenetics underlying this particular complex association? So we can use publicly available transcriptome data to look at how the expression of the MAPT gene, for example, changes, with these different haplotypes. And there is a significant, loss of MAPT expression in the h two haplotype, but it's relatively mild. And so what we sought to explain was how does this, gene expression change occur from an epigenetic standpoint. So we're gonna do an allelic comparison of these two haplotypes. We have an h one haplotype homozygotes, h two homozygotes, heterozygotes. For the h one and h two, we can just handle them as is. We don't have to split their reads. But for the h one, h two heterozygotes, we're actually going to split the reads based on the SNPs that occur here in this haplotype region into the h one and h two reads, and in both cases, we're gonna do some differential testing. This case in particular is interesting because it's exquisitely well controlled. The h one and h two reads are coming from the same individuals, from the same cells. So here is that same locus. I'm highlighting for you the promoter of the MAPT gene. And and as we, I'm gonna animate for you on top of of this thing. I'm gonna show you, here I'm showing you bulk ATAXEEK data so you can see high accessibility at the MAPT promoter, and you can see some changes in accessibility that are haplotype specific. So this peak is h 1 specific, these peaks are significantly higher in H2 haplotype, and so we'll focus on those two regions, and we'll call these other regions A and B. And so when we look at this B region down here, there's a very, very strong increase in interaction frequency based on high ChIP with this a region upstream. And so now we'll follow these throughout. And this dotted line represents our focal point for the high ChIP assay where everything on this y axis is relative to its interaction with that point. So now we can move upstream to the MAPT promoter, and you can see a slight increase in interaction with this H1 specific peak. If we shift to that H1 specific peak, you can see, highly increased interaction with the promoter and with this downstream MAPT enhancer. And of course, if we shift all the way over to this A region, you see the reciprocal high interaction strength with the B region downstream. So all of this goes to say that there are big changes both in chromatin accessibility and in three d, enhancer promoter interactions that are changing in this locus. And those do have, changes in the corresponding gene expression. So, here, I'm showing you each gene as a bar. Some of them, you don't see because they're just at zero because they have no difference in h one or h two. And then these genes, down here are upregulated in h two individuals, and these are upregulated in h one individuals. And so hopefully, can appreciate that what whatever is happening with these sorts of interactions between the a and the b regions is definitely changing gene expression. We don't particularly think this is that this gene expression is driving the association because these are largely, pseudogenes and antisense transcripts, but it is a relatively provocative mechanism. So this is all happening inside of the breakpoints. But what happens outside of the breakpoints? And, again, that's interesting because the region inside these breakpoints is actually being inverted in the H2 haplotype. So here I'm showing you the focal point on the MAPT gene, and if we look upstream, both in homozygous individuals where we haven't done any reed splitting, and in heterozygotes, with the allelic reed splitting, there's an increased interaction with this region here, which could be a potential long range enhancer. When we look at that region, we see multiple neuron specific peaks that are likely, enhancers, and so, we believe that these regions are interacting over a long distance with the MAPT promoter to increase its expression, specifically in the H1 individuals compared to the H2 individuals. So to put this schematically, in the H1 individuals, you have the inversion in this direction and a long range interaction between this enhancer and the promoter and a production of more, MAT T transcript. In the H2 haplotype, this a and b region, they're inverted, and they're also interacting with much higher frequency, which seems to insulate the MAPT gene from this distal enhancer. So hopefully, gives you a flavor for how this sort of integrative epigenomic analysis can really provide some interesting and informative interpretations of, different genetic associations with disease. So why should you care if you're not in this field? I do think it's important that we continue to identify new genetic targets, and this of course has translational implications. Those new genes that we implicate in the disease will nominate new molecular mechanisms, they'll provide new insights into this multifactorial genetic interaction that's occurring between different variants that we inherit, and hopefully long term this leads to better prognostication. So just in the last few minutes, I want to highlight some of the work that we've been doing in this kind of large scale multiomics, and in particular on the software development side. So ATAC seq, in particular single cell ATAC seq, there's not a lot of analytical approaches that really enable these sorts of, massive scale analyses, and and some of that is due to challenges that are present in in ATAC seq in particular. So if you were to compare ATAC seq and RNA seq, the number of features is very different. So, you know, most people, end up with a few thousand to 10,000 genes, per cell, and for ATAC seq, this is much, much higher. There's hundreds of thousands of regulatory elements across the genome. And this becomes really complicated because the feature set in ATAC seq is also dynamic. So different cell types might express the same genes, there's only a limited number of genes in the genome, but the number of regulatory elements is 10 or more fold higher and is extremely cell type specific. So as you add new cell types to your analysis, the feature set changes, and so the problem gets larger and larger with ATAC seq, whereas for RNA seq, there's a certain number of features, and the feature set doesn't really change. And so this creates computational challenges, where the matrices and the analyses that are being performed require some pretty fine tuned, solutions to really enable the scale of data generation that's now possible with the 10x Genomic System. And so our solution to this has been to develop this software package called Archer, which is highly robust and scales very well to what we consider to be massive scale data sets in the millions of cells. In fact, on a standard MacBook laptop now, you can analyze about a million cells through dimensionality reduction and clustering in under eight hours. So this really enables the scale of data generation that's now possible, and we think that the underlying infrastructure of Archer is highly intuitive, it's super well annotated with a manual, and at the risk of continuing this infomercial, I'll just leave it at that and say you should visit archerproject.com to check it out if you haven't already. So to summarize what I've told you today, the tissues and cell types, especially in the brain, are extremely complicated and multifactorial, and using single cell assays can really help understand what's important about that different cell type specific biology. Hopefully I've shown you that the non coding genome matters and the work that we're doing really heavily relies on sickle cell chromatin accessibility profiling to identify these functional non coding mutations. I showed you one example where this vertical integration of multiomic data can really provide key insights into disease predisposition, and, you know, this is not an isolated example there's many examples where integrating these types of data are hugely important. And I think that the field of machine learning, really only adds, to this. The field is really developing quite quickly, and the insights that we'll gain by combining assays and layering on machine learning, are really impressive in my opinion. And lastly, told you, about our efforts to support the new age of single cell genomics, by developing new tools and analytical paradigms. So with that, I'll just, thank the people involved. My lab, as I mentioned, is quite new and we're actively growing, and we're very thankful for the generous funding that we received from multiple sources. So thank you for your attention, and I look forward to the rest of the seminars. Thanks so much for a great talk, Ryan. If any of you have questions for Doctor. Courses or for 10x, you'll have five more minutes to submit those questions in the Q and A box, and we'll get back to you at a later time with the reply. You can also contact us with any follow-up questions you have at the email address support10xgenomics dot com. Thank you so much for joining us and have a great rest of your day.