#SideByScience Webinar:
Analytical validation of clinical whole genome sequencing for germline disease diagnostics: Best practices and performance standards
Dr. Christian Marshall
Dr. Christian Marshall
Co-Director, Centre for Genetic Medicine
Associate Director, Department of Paediatric Laboratory Medicine

Dr. Christian Marshall will present the findings of the Medical Genome Initiative, a consortium of leading health care and research organizations in the US and Canada, that was formed to expand access to high quality clinical WGS by publishing best practices.

Whole-genome sequencing (WGS) has shown promise in becoming a first-tier diagnostic test for patients with rare genetic disorders, however, standards addressing the definition and deployment practice of a best-in-class test are lacking. This webinar will present consensus recommendations on clinical WGS analytical validation with a focus on test development, upfront considerations for test design, test validation practices, and metrics to monitor test performance.

This is the first in our series of #SideByScience webinars, promoting a scientific agenda in a time of social distancing. See the series agenda.

We’ve provided a transcript for this webinar below. The transcript was created with automation and may contain inaccuracies. We recommend accessing the recording for an accurate representation of the session content.


Hello, everyone. Thank you for joining us today. This is the first in our new side by Science webinar series. With me today is Dr Christian Marshall from SickKids, Toronto Hospital. And Christian will present his talk on “analytical validation of clinical genome sequencing for germline disease diagnostics: best practices and performance standards”. We are happy to have you here connecting from your homes. My name is Shay Tzur. I am a PhD in Human Genetics and Emedgene’s CSO. 

I am your host for this webinar.

For those of you who don’t know Emedgene,  we apply machine learning models to genomic analysis and interpretation in order to increase the capacity of genetic labs. The agenda of the series is driven by topics that are of interest to the labs we work with and the challenges they face. We are going to end it off to Christian to get started with his presentation right away. At the end, we will share the results of an informal survey on the status and challenges of whole genome sequencing adoption.

Dr. Christian Marshall is the co-director of the Center for Genetics Medicine and also the associate director in the Department of Pediatric Laboratory Medicine at SickKids in Canada. Today, Christianis going to present the work of the MGI Medical Genome Initiative, a consortium of leading health care and research organization in the U.S. and Canada that was formed to expand access to high quality clinical word genome sequencing by publishing the best practices. If you have any questions, please type them into the QA widget below.

We will spend a few minutes answering questions at the end. Christian, welcome.. mic is yours. Thank you. Great. Thank you very much, Shay, for the kind introduction. I am just going to share my screen and put this in presentation mode. Great. I hope everyone can see that OK. Well, thanks a lot for joining. I hope everyone is doing well, staying healthy. As Shay said, I’m a clinical lab director here at SickKids Hospital.

And I’m going to talk a little bit about some of the work that we’ve been doing. I’ve actually split the presentation into two parts. The first part I will be introducing work we have done with the Medical Genome Initiative, a consortium that really is looking at trying to define best practices for the implementation of clinical genome sequencing into health care. And, then we are going to look a little bit at one of the papers that we’re trying to publish that is looking more at the analytical standards.

And then I’ll drill down a little bit and do a bit more specific, dive into some of the work we’re doing at SickKids, specifically around a challenging aspect of validating a genome test and that’s the copy number variation analysis. So before I get into that, I think I just wanted to set everything up as to why we’re doing this. So, a lot of work that we’ve done in the past and many others around the world have been looking at using whole genome sequencing potentially as a single genetic test that can be used.

 Whole genome sequencing vs current standard of care

And so this  is the work we did a couple of years ago and looking at the comparison of genome sequencing versus standard of care for many of the clinics throughout SickKids, we found that the diagnostic yield was roughly doubled compared with the use of standard testing. And I think what was really interesting and what we really noticed was and this is perhaps not surprising is the amount of genetic testing that needed to be done per patient.

So, on average, there were three genetic tests that were done per patient and a microarray analysis was the one that was the most utilized. So our increase in diagnostic yield for the genome sequencing was not just due to genes that were not in the targeted tests that were done standardly, also due to things like non-coding variants, including deep intronic and also like miRNA. So some things that you might not actually capture when using something like a whole exome sequencing technique.

And we also found a lot of CNVs that were below the resolution of standard clinical microarray. And so, this kind of really set the scene for why we wanted to get this into the clinic. And obviously, these are a couple of papers that we published and Ryan Taft Group with Illumina has also published a nice paper looking at this as a first-tier test. And then, of course, a lot of the work done by Stephen Kingsmore at Rady’s has also shown the utility of genome sequencing for the diagnosis of rare disease.

Clinical utility of WGS as a first-line genetic test

So, genome sequencing is looking towards being a first line genetic test. It may be very useful for this purpose, but the actual clinical validation of whole genome sequencing is quite challenging. And one of the issues right now is there are not a lot of standards in place. And of course, standards are very important. It allows comparison between laboratories and ultimately is important for the safety of patients. A lot of the professional bodies out there, like the ACMG and CAP, etc. do have some guidelines for NGS testing in general.

Inception and goals of the Medical Genome Initiative

But I think a lot of the specific challenge is  validating the whole genome sequencing test is not really addressed by this. So, with that in mind, a lot of colleagues, we got to talking and we’re all having the same sort of issues. And so out of that was created this medical genome initiative. And so, as Shay described earlier, this is really meant to be, its mission is to expand access to high quality whole genome sequencing for the diagnosis of rare germline diseases.And really, what we wanted to do was publish laboratory and clinical best practices for the implementation. And this was towards a benefit, a greater benefit to the group, to people, in general, that are looking to set this test up. And you can see below the member institutions currently of the consortium. And so, one of the things that we first did was try to define a best practice topics roadmap, I suppose.

And we did this by looking at all the different aspects of how clinical genome sequencing would integrate into patient care, including things like patient selection, laboratory testing, the diagnostic aspect of it, and then outcomes. And within these, we split them into different types of overall themes that could be under each of these categories and decided which ones we tried to prioritize, which ones we felt that there were some large gaps. And the idea here was to survey the group consortium, see what the practices were currently, and then try to derive some consensus around it.

 Analytical validation of clinical whole genome sequencing

And so, the one I’m going to talk about today, the ones that  start here, are ones we have working groups for right now. But the one I’m going to talk about today is the analytical validity of genome sequencing. So, this was the first work group that we formed and was one where we felt like we could benefit the community most by publishing. And so, this is best practice for analytical validation of genome sequencing. It’s really intended as a little bit of a potentially, you know, current state as to what’s happening with these institutions right now that have validated or in the process of validating clinical genome sequencing.

We surveyed the group for current practices and then we tried to define some consensus statements that were specific to genome sequencing around a lot of these key aspects in the validation. So, this figure here is a little bit more agnostic to actual clinical genome sequencing, but it really is taking us through the different steps that you might want to do when you’re when you’re developing a lab based test. So, these are the key steps.

So, we looked at test development, optimization, test validation and quality management. And then within each of these, we defined a lot of the activities. And this is essentially how the paper is broken down, a bunch of the activities that are needed. And we specifically looked at them with clinical genome sequencing in mind, including things like test definition, measuring test performance and then, of course, quality control and then the kind of outcomes that you would need to have before you can offer this test.

And so, one of the things that we did within the group was trying to define a couple of consensus statements across a lot of these key steps here. And I’m just going to briefly go over them within the paper itself. And at the bottom, you can see on our Website there is a preprint available. This paper is currently  in review. And obviously we break down a lot of these consensus statements in much more depth in the paper.

So, that really briefly are some of our consensus statements are the test definition. It was felt through the group that wherever possible you need to analyze and report on all possible detectable variant types from genome sequencing. We realized that this is not necessarily an easy thing to do.  Genome sequencing is a very powerful technique, but it’s often difficult to be able to validate a variant type up to a clinical standard. At the very least, we recommended that individuals, when they validate a test, can report out  the small variants and also copy number variants.

And this was sort of what we felt was viable as a minimal appropriate test. Number two was around test performance. And one of the consensus here was that related to number one and the test definition, we said that you should be able to take that aspect of the genome sequencing and it should aim or meet or exceed that of any other test that it’s replacing. The obvious tests that clinical genome sequencing would be replacing are whole exome sequencing and chromosomal microarray analysis.

So, during your test performance, you should be meeting at least those standards. Number three is more of a question around what genome coverage, because that’s a question that a lot of people ask, what coverage do you need? And rather than thinking about genome coverage, there’s a statement in there about looking more at genome completeness. And this gets to ‘call’ ability, meaning how well can you call a specific position? And then, these are the measures and the performance metrics that should be much more used to define performance rather than just straight coverage.

Moving on the last couple consensus statements. One of the other important questions that came out of this was, what should you be using as reference standards and positive controls? We know that for small variant types that a lot of the reference standards that are available, including the genome in a bottle and other standards, are very useful for measuring accuracy of your small variant calling. And so, you don’t really need a lot of controls for that or to be able to validate the test.

However, there’s other more, you know, more difficult regions of the genome to call, including other types of variant types, including copy number variants and maybe repeat expansions, which I’ve got an example here. You often will need a large number of positive controls to be able to validate the test. Within that test validation framework is number five, we felt like you should be looking at it in several different dimensions, including metrics that account for genome complexity, so how complex the region is, if it’s repetitive or what not or it’s more easy region to call and obviously special attention to the sequence context and the variant type that you’re calling.

And so, you have to. validate each variant type differently in relation to the sequence contents, etc. And then finally, quality management, this is one question that we were stuck with was how often do you need to run positive controls? And because the genome sequencing gives you so much data, it was really felt that ongoing quality control should include really the identification of a comprehensive set of test performance metrics and the continual launching of these metrics.

If they meet a certain level, then it’s usually OK. And then over time, you can use positive controls on a periodic basis depending upon overall sample volume. So, this is one of the recommendations that we were making rather than having to run necessarily a positive control for every single batch. So that’s outlining some of the brief consensus statements that came out. I just wanted to highlight one of the of the things that we talked a lot about. And then maybe take a deeper dive into some of the stuff that we’re doing here at SickKids that’s in relation to this.

Types of variants used for analytical validation

So, we surveyed the group and looked at what types of variants would they actually be offering and validating. And so, you can see in blue, these are the bottom. You can see the different variant classes that we have here going all the way from the small variants, all the way to repeat expansions, more targeted pseudo gene type analysis that would be included. And including things like from genetic testing, but also some of the things like spinal muscular atrophy, where you can do targeted testing of some of these more complex regions.

And so, you can see that not surprisingly  there’s a range in what people are planning and validating right now. And so, offering these at different stages as their test develops. And the thing I want to talk about today a little bit more detail was the CNVs and structural variants. And so, we’re defining here structural variants, sort of that all balanced and unbalanced variation that’s greater than 50 base pairs and size. Obviously, copy number variation is a subset of the structural variation and we’ve split it up this way just because copy number variants and traditionally at least for diagnostically, we’re looking at larger variants and deletion/duplications that have microarray resolution greater than 1 kb. Although you could argue that a lot of the microarrays the resolution is not as high as 1 kb.

CNVs identification from WGS Data

So, a lot of the early lessons that we learned in looking at the analysis of copy number variants from genomes we did with the genome centre. So, this is a collaboration with the Centre for Applied Genomics, the Hospital for Sick Kids. We actually broke this down and we looked a lot at the read-depth algorithms that were used to call genomes. We looked at a comprehensive evaluation of about six different read-depth algorithms because there were dozens of them out there.

So, we were trying to figure out which ones would work the best. And some of the lessons that we learned, PCR-free genome libraries  are what’s needed for  CNVs because it has a much lower false positive rate due to more uniform coverage. The other thing that we learned from here, it was actually very difficult to determine a specific sequencing or analysis metric that predicts very poor performance of CNV calling. I think there’s metrics that you can use when you notice, the CNV analysis isn’t going to work, but it doesn’t always necessarily correlate really well with what your final result is going to be. 

Within this paper after the evaluation, we used a couple of different read-depth algorithms for the research going forward. So this was ERDS and CNVnators. We had a combination of these for optimum recall, but I think overall what one of the lessons is, it’s also challenging to use sort of these open source programs because they’re often not necessarily under development. And these is a challenge as we start moving more towards the clinical validation of these tests. So, at SickKids anyway, this is the design of clinical genome sequencing that we came up with. 

Reference standards for CNV calling

So here we’ve got the different reference types that we wanted to use. So, these are the reference standards, you can recognize some of them as from the Genome in a Bottle, plus the Asian and Ashkenazi Jewish trio. We also have done a lot of work with the Craig Venter genome. We have a lot of these replicates. So, the variant validation that we’re doing, small variants and copy number variants. So these things that are a little bit larger than 1 kb  and then performance metrics I have over here and you can see that at least for a lot of the small level or small variant calling, you don’t necessarily need a lot of samples.

We do have some positive controls that have come back from clinical whole exome sequencing, so, we’ve got about 20 of those that we’re going to use. And then the CMA or the chromosomal microarray analysis, we’ve actually had to pick 31 samples. And I’ll talk a little bit about how we built our truth set for this and how we evaluated it using our sensitivity and precision. So, our overall goal, at least to break this down, was for CNV calling to have an accuracy that’s equivalent to the chromosome microarray analysis, including up to that resolution.

We really wanted to start with the read-depth callers with just to make things a little bit more simple. Our analysis pipeline that we are using is the Illumina Dragen  program. And so, we wanted to be able to get down to 10 kb in size because this is kind of the resolution of the Affymetrix array that we were using. And then the idea would be the orthogonal confirmed variants that are smaller than this just to be able to call them, but also have to move on to an orthogonal technique before signing out.

We also are taking advantage of the whole genome sequence using paired-end reads and split reads as well. But this will be introduced more as a rolling validation in the end. And the reason for this is because you get so many variants that it can often be overwhelming to try to report these. So our combination of reference standards and positive controls are used. So, we are using the NA12878. We did a coverage experiment just to see how CNVs were affected by different coverages.

And this was also to decide what overall coverage that we might actually want to sequence, too. So, this was evaluated. And then we have this, as I said, a true set from 31 different positive controls with a range of different sizes and types of pathogenic variants. And the one thing I also did want to mention was that, you know, reference standards, I think for the single nucleotide and the smaller variants are very good right now.

But I think for CNVs, there’s still a lot of work to be done and are very much evolving. So, this graph is really just meant to show you that. So, this is the Venter reference that we’ve been using. It has about 23,000 variants in it. And then you can see the Genome in a Bottle sample here. And then this is the Ashkenazi Jewish proband, which has a more complete or a larger number of variants that you can use for benchmarking. 

Recall and precision for deletion calling

And so just to show that these are quite disparate. And right now, I think that these are evolving, but they’re still not necessarily really, really good to use. However, it is something that does need to be used. And so, in this case, what we’ve done, you can see this is for the NA12878 benchmark. This is the recall and precision for deletion calling across different depths of coverage.

And as you said, the reason for this was trying to figure out whether we needed to go to a different depth. And we split it into different sizes as you can see.  Overall, the results have shown and this is using the Dragen CNV caller. However, our results are very similar using any other  read-depth caller. What you can see is that in general, increasing depth doesn’t really get you much in terms of your sensitivity a lot of these levels, especially once you go over 30X

So, we were happy to see this based on other data. We are planning to go to at least 40X for our clinical genomes.  The precision is the other thing I wanted to talk about here because this is a bigger issue with this. And I think that this actually goes back more to the benchmark. So, you can see the relatively low precision.

And I think that this in many instances we’ve noticed that the calls are probably real and they’re just not necessarily in the benchmark. And so I think that this is one of those things that where, as I said, that the benchmarks are evolving. But it could indicate that there are issues. So, I think you just have to take the use of these with a grain of salt. So, moving on to what we decided to build for our true set for we’re using here a positive control.

So, we’ve got a lot of experience here doing microarray analysis signing out. So, based on over 25,000  chromosomal microarray analysis reports signed out clinically, what we did and this is work with one of the directors here…. We looked at the most common deletions and duplications that we’re signing out that are pathogenic. And so, you can see them here in terms of the…. So, one of the things that we wanted to make sure was that we could include these in our positive control dataset.

Constructing CNV comparison truth set

So, this is constructing the CNV comparison. You can see the true set and inside different size bins.  So, the idea here was we took these 31 positive controls that we ran genome on. So, we had clinical microarrays, we ran genomes on them. We have a split of losses and gains and we have done them over different sized bins as well. And this is just to kind of get a sense of how well we can detect these.  The CNVs that were  set filtered from the true set.

So, we are looking at things with greater than 25  probes on the Affymetrix array. We are looking at things that are greater than 10 kb in size.  As they say, it’s not like we’re not going to be looking at things less than 10 kb in size. But what we wanted to do is validate this test at the resolution of microarray. We’ve also filtered based on rarity. So, what we did, we took the true set and narrowed this down to the variants that are rare.

And one of the issues here is that common variants in microarray and genomes are often displayed differently. And it’s really hard to compare, especially if they’re in messy regions. But because they’re common, it’s not something that we’re interested in and validating anyway. And so that’s kind of the last point here, we excluded a lot of the regions that are overlapping some of these complex regions, including to some telomeres and centromeres.

So, one thing that we first started to do was to look at the sensitivity of the callers compared to the true sets. So, this happens to be, as I said in development, we were looking at ERDS and CNVnators. We wanted to move more towards a solution that was being maintained. So that’s why and within the package that we’re using. So, this is when within the Draeon software analytical pipeline, you can see the recall.

So, there are deletions: reported and unreported. So unreported means rare variants that are things that we didn’t necessarily need to report. And then you can see the reported ones would be the ones that are pathogenic, and you can see the duplications here as well. So, this is giving a little bit more sense of how well we could pick up some of these  variants. And the answer is very good. So, you can see near a hundred percent recall for this for the true set using these different algorithms.

So, we’re happy with the performance in this case. The other end of the coin here is looking at precision. So, as you said, 100 percent of the reportable CNVs greater than 10 kb were important. So, this is 109 in total. And interestingly, the precision is quite high as well. So, this is just what the Dragen caller that we’re gonna be using and deletions and duplications. So, on average, we were over these 31 samples. 

Dragen CNV call accuracy 

Actually, we didn’t get many. We got about  10 different false positives over the size of 10kb. And about 15 to 18  false positive duplications. So, this is really important because obviously one of the challenges is once you have your CNV analysis, you need to make sure that you have high precision so that you’re not chasing up a whole bunch of false positives. So that’s a really quick sort of overview of what we’re doing at SickKids. 

Summary and challenges

I just wanted to summarize by saying that clinical genome sequencing is becoming really a first-tier test and diagnostics, but the guidance for clinical implementation is really just emerging. So that’s one of the reasons for the genesis of the medical genome initiative. And we’re providing some consensus recommendations for how to go about analytically validating genomes. And this is really based on the experience of the group. We feel like clinical genome sequencing should include small variants and copy number variation at a minimum.

And really, it should be at a place where it can replace whole exome sequencing and chromosomal microarray analysis. The validation of copy number variants using whole genome sequencing is quite complex. It’s a lot of technical challenges in calling the CNV and a lot of difficulty in obtaining reference standards and also in a very efficient way to annotate and interpret the variants as well. So, it’s very challenging to do this, and that’s why we’re taking sort of smaller steps and we’ll be able to do smaller variants and other algorithms later on.

So, it includes our future work. It’s really expanding the whole genome sequencing analysis to include more complex structural variants and then further downstream moving into repeat expansions and pharmacogenetic variants as well. So, we’ll just end with acknowledgments: all  individuals here in genome diagnostics who have helped with this, especially Lynette Lau who’s done a lot of the informatics lead on this. And informatics obviously is a big challenge in doing this analysis. Great collaborations with the people at Illumina as well, the Centre for Applied Genomics, which is the research centre here.

We work on a lot of things with them and then translate them over to the diagnostic lab, the Centre for Genetic Medicine, and then, of course, the expertise within the Medical Genome Initiative. You can see a lot of the members here that have worked on this analytical validity paper. And just to acknowledge some of the funding and collaborative partners. So, thank you. Thanks for listening. I think we’ll get to questions in a minute. But maybe first of all, head back to the host.

Download Slides

[hubspot type=form portal=6458790 id=74591237-fdde-47a5-b251-a43c052f5284]