#SideByScience Webinar
Children’s Rare Disease Cohorts: An integrative research and clinical genomics initiative
Shira Rockowitz, PhD
Shira Rockowitz, PhD
Bioinformatics and Genomics Lead
Boston Children’s Hospital

This webinar presents the work of the Children’s Rare Disease Cohorts (CRDC) initiative at Boston Children’s Hospital, which integrates genomic, research, and clinical data to facilitate hospital-based analysis and reanalysis of genomic data to accelerate rates of discovery and patient diagnoses. 

The CRDC created a unified and centralized infrastructure for genomics research at BCH, streamlining and standardizing electronic consenting, sample collection, sequencing and phenotyping. 

Data was collected and made available to researchers through a Genomics Learning System, which combines enterprise solutions for secondary and tertiary data analysis, including Emedgene’s machine learning variant prioritization and knowledge graph, GORdb cohort analysis and more. 

Utilizing the standardized processes and GLS infrastructure, the CRDC successfully recruited 2441 participants and launched 15 studies of rare pediatric-onset diseases within its first year, yielding novel findings and new diagnoses for patients.  

Join the webinar to learn more about the CRDC’s workflow and outcomes. 

This is the second in our series of #SideByScience webinars, promoting science in a time of social distancing. See the series agenda.

View Recording:

We’ve provided a transcript for this webinar below. The transcript was created with automation and may contain inaccuracies. We recommend accessing the recording for an accurate representation of the session content.


So, everyone, thank you for joining us today for another Side by Science webinar, we’re happy to have you here. Connecting from your officers and homes, I’m Niv Mizrah and Emedgene’s CTO. I’m your host for this webinar. With me today is Shira Rockowitz, Bioinformatics and Genomics Lead at Boston Children’s Hospital. Shira will share her work on a great and unique initiative at Boston Children’s Hospital called the Children’s Rare Disease Cohorts. The CRDC has been able to create unified infrastructure across the hospital for genomic research and as the results of their efforts, they’ve launched 13 studies in the first year of operation and sequenced over 2000 patients.

We, at Emedgene, have been very proud to be included in the CRDC genomic learning system. For those of you who don’t know Emedgene, we are applying machine learning models to genomic analysis and interpretation. I think the CRDC initiative has solved traditional research pains in a very innovative way. And I’m excited to have you present your work here Shira. If anyone has any questions throughout  the webinar, please type them into a Q&A widget below.

We’ll spend a few minutes answering the questions in the end. And Shira, welcome to SideByScience. You’ve got the mic. 

Thank you so much Niv. And thank you for the introduction. OK. I can now share my screen. So I’m gonna try to get this in full screen. And Niv please let me know if you can see it. Perfect. Thank you. OK. Thank you again for the introduction. So I’m Shira Rockowitz from Boston Children’s Hospital. And I’m going to be talking to you today about the Children’s Rare Disease Cohorts:  an integrative research and clinical genomics initiative.

And so this really is talking about how our institution works systemically to integrate our clinical and research efforts to provide cutting edge genomic insights for patients.

The Children’s Rare Disease Cohorts Initiative 

So, the vision of the Children’s Rare Disease Cohorts Initiative really was to drive therapeutic and research discovery, to push forward on the best standard of care with researchers and clinicians being able to leverage data together and to build collaborative networks and prepare data to be able to be shared across the institution and with other academic and other collaborations. So to do this, we really needed to accomplish a number of complex tasks. For starters, aligning stakeholders across the institution and spurring some cultural changes that enabled data to be shared and used.

And what I’m going to be sharing today is working. PIs across the institution, are utilizing this infrastructure, are fully on board and reporting positive outcomes from this work. And the institution is behind the genomic learning system that we’ve created. Now that we have this system, we have a database with some of the most precious rare disease patients and physicians who are leading these research areas. We really have a system that’s bridging our unique population and unique expertise that can easily be extended.

So here we have the cohort investigator PIs. I’m going to go into a little bit more depth about the different cohorts that are part of the Children’s Rare Disease Initiative. But the goal is really to do large scale sequencing of rare disease cohorts and be able to utilize that information to drive forward therapeutics and interventions. The project was internally funded from a strategic fund. And one of the innovations that we found here was in doing clinical grade sequencing, which was really motivated by our desire to provide patients with the best quality sequencing and end results.

So far, we’ve finished the first year where we sequenced over two thousand patients and into the beginning of year two, we are now up to about six thousand people who have been sequenced. We’ve rapidly on-boarded a COVID cohort in the last few months and other additional cohorts and are now thinking about new modalities and collaborations and ways forward. So this work that I’m talking about here today represents the work of a lot of different people. The cohort study PIs, their entire research teams, have contributed in consenting and analysis and many other aspects and really worked to streamline the different workflows.

Our hospital leadership, who has really led and created an environment, an ecosystem that’s been able to bring everyone to the table, representation across the institution and various stakeholders involved in the selection of cohorts, providing feedback on how consenting should be aligned across the institution and many, many more individuals. And so another thing I want to highlight, especially as we get into the genomic learning system and how that was developed, is that because in research computing, we sit within the larger information services department and lives and work very closely with the researchers, and we are really well-positioned to drive forward on the technical aspects.

Timeline: Idea to 6000 sequenced patients

So we will be describing work that’s really happened within the last year and a half. The project has roots going back much further. So at Boston Children’s Hospital, researchers have been conducting genomic research for close to 30 years. But I’m going to focus on the last few years where as early as a few years ago, we had internal stakeholders who were already developing recommendations to the leadership about how genomic information should be harmonized across the institution. 

In 2017, a subcommittee was assembled which conducted a survey across the institution to understand to what extent sequencing would be valuable across the different departments and divisions. And the response is really overwhelming. Over twenty five thousand patients were being seen at the hospital that researchers in different departments had indicated would benefit from sequencing. And this was growing rapidly. So this really corresponds to triple that number that would potentially need to be sequenced when you factor in a trio sequencing, are really driving forth diagnosis.

In principle, this execution timeline is very clean and clear. And it looks really straightforward. But  you know, it really represents a lot of alignment across the institution that was not at first task straightforward. And the two cohorts were selected from the survey that had high Mendelian inheritance and high ongoing involvement efforts to join a pilot. The project was funded in late 2018, and there were a lot of alignments in setting up this initiative. And really that was done in collaboration with the first two cohorts, which were led by Dr Annapurna Poduri studying epilepsy and Dr. Scott Snapper offsetting inflammatory bowel disease.. And I’ll go into those in more depth. And a lot of those changes that were happening in the institution, alignments that were made were really driven by the data and able to be implemented with the support provided by the initiative. And the initiative was really quite data driven. So just to give you a brief example was we started to collect samples back in late 2018.

We started off with collecting blood samples for whole exome sequencing. And it became rapidly apparent that, you know, usually one parent was coming into the hospital with the patient  for a clinical visit, and which is where most of the enrollment was happening actually in the clinics. But the other parent would be at home and there would be complexities in collecting another sample from the parent. So we worked with GeneDx . I’m going to be talking more about the different technology providers that we worked with.

We worked with GeneDx that was doing the sequencing for this project. And the research teams to set up a buccal swab workflow, which really helped us to collect a lot more patient samples. And actually at this point in time represents about 95 percent of the sample collection through the project. So there were really a lot of optimizations made in response to these two bottlenecks that were identified in a data driven process. So phase one of the project really focused on setting up the mechanisms for this initiative.

And now as we start phase two, we’re really looking forward to new collaborations, new cohorts, technologies and data sharing. Our paper describing the first phase of this project has been accepted in Nature genetics in medicine. So this is a table with all of the cohorts that are part of the project. We have a cohort disease, the primary investigator, estimate of the BCH patient population overall and then the number of samples that have been so far collected. And we’re about at 6000 patients.

And this is really quite accelerated, considering the fact that consenting didn’t start until late 2018 and represents an enormous amount of work on the part of each of these research teams. And it’s only, I think, because we had a distributed process that really leveraged existing expertise in each of the research groups that we were able to scale up so rapidly. So from there were many different kinds of alignment that had to occur to make this project move forward on the institutional level.  

The cohort selection funding and the inclusion of various stakeholders from a process perspective. I’m going to go into more depth about the different consenting alignments that were done. But really, the process of alignments were led by practicality and the application of technical innovations. From an enrollment perspective, we’ve tried to leverage additional technology. So, for example, we’ve rolled out e-consenting, which has become increasingly available in this current era. And then sample collection, I already talked about how buccal swabs really accelerated aspects of sample collection, but also with the new COVID cohort we’re finding that our clinical discard samples are really accelerating sample collection, you know, when infection control measures need to be considered. And then I’m going to be talking also about the technical alignments to bring in data from different data silos and integrate them. 

Changes to patient consent process

So I want to focus on the consenting alignment here. And keep in mind three things: so data sharing,  support for therapeutics development and broad rather than their disease specific use .So here we have this table that shows a  brief analysis of the consensus that were incorporated into the children’s rare disease cohorts and how frequently some facets that were previously part of those. So one thing I want to highlight is the clear sequencing, which really enabled us to send samples to GeneDX in an identified manner so that when there was a clinical confirmation that was identified by our research team,  the patient was able to get that variant clinically confirmed at GeneDX, yielding a clinical report without having to send in another sample.

Also supporting therapeutic development, enabling more data use, as well as potentially offering patients involvements in  other research studies. And then really having broad rather than the disease specific use. And you can see this is something that was rarely incorporated into consent forms. 

Clinical-Research Pipeline

So last but not least, there was clinical alignment. One of the main principles was to align with the clinical needs at the hospital and follow up that would have potential benefits to patients. So after samples are collected, they’re sent for clinical grade sequencing, GeneDX and data was return to our research environment and loaded into the Boston Children’s Hospitals genomic learning system for analysis.

When researchers identify a variant of interest that might explain the patient’s phenotype, they work with the clinician who’s been listed on the consent form to submit an order back to Gen DX and request that that variant get clinically confirmed. And GeneDX is able to use the original sample because it was collected in a clear manner. We’ll focus the next number of slides on the genomic learning system. But I also just want to note that this genomic learning system, we also are incorporating clinical sequencing data that’s coming back to the hospital.

And so all of that information is fed back into the genomic learning system from the clinical on the research arms of the hospital. So with the alignment of consenting principles, data can be shared. But there is frequently a gap between theory and practice on that front. So the integration of data originating from different data sources really is a complex task. And this is what the motivation was to create a genomic learning system to really uniformly process and analyze patient data with the idea that it would accelerate discovery and foster collaboration. 

 By deploying this across the institution, any researcher, regardless of their bioinformatics background, can start analyzing patient data immediately. And before we dive into the details, I want to just go over three basic principles. So first, the genomic learning system is a centralized data repository where harmonized data is generated and stored, which means that we can provision access to that data in one locale rather than copying it over and over two different systems. The genomic learning system is modular and has robust analytics, which can answer many questions in which we’ve continued to expand.

It’s worth noting that this is not a single interface, but many interfaces into a single project, all working jointly and feeding into each other that we’ve worked with various groups to develop. And data security is also a really important facet as this is a patient genomic data and patient phenotypic data …… identified is still very sensitive. 

Phenotypic Data Collection

So zooming in on the phenotypic data collection, I want to talk about where we’re getting from. So first off, we are getting information from the electronic health record, which includes structured information from the  diagnosis codes or procedures, as well as information that’s being extracted from clinical notes.

So one of the other technologies that we’re working with is Clinithink which  is processing clinical notes to extract human phenotype ontology terms those are integrated. We also have incorporated each research team’s registry into the genomic learning system. And so we’re pulling information on phenotypic metadata and family relationships, among many other variables from the research teams. And we have done this in a way where we’re pulling it. However, the research team is already annotating it.

So, for example, they might have a checkbox which says “Epilepsy” and then we’re translating that into the Human Phenotype Ontology terms, which are necessary for these downstream analytic tools. So zooming in on the genomic learning system, which is really the core of my team’s work,  we are taking this harmonized genotypic data and phenotypic data and merging that into a genomically ordered relational database. Genomically ordered relational database is able to handle both genotypic information as well as non genomically ordered information.

And so there are a number of different analytic modules and tools and systems that are accessing these harmonized data. So for example, through the database researchers are accessing and dealing analysis are able to do cohort level analysis, such as sequence kernel association testing. And we’ve extended that with cohort level analysis to also provide information about the family inheritance and inheritance patterns. We’re also working with other analytic systems. We’re leveraging the same harmonized, genotypic  and phenotypic data.

Emedgene is providing researchers with prioritized variants and integrating their knowledge base, which has proved valuable. And researchers at Boston Children’s Hospital are able to go to, for example, TriNetX  and ask questions about how many epilepsy patients that have an MRI also have genomic sequencing to create studies. So all of these pieces of information are fed back into the genomically ordered relational database and  the whole genomic learning system is extensible, flexible and creates feedback loops.

I want to really highlight, because I think this is an important part that our work with technology vendors to really enable us to accelerate this process. So my team is pretty lean and we started working with Emedgene in the last year or year and a half. Same with Clinithink and GeneDX. So we don’t have sequencing capacity at Boston Children’s Hospital for whole exome sequencing. So really working with the expertise of different technologies really enabled this project to be kicked off quickly and efficiently.

So coming back to us, things are exiting the genomic learning system. What are different stakeholders, patients, clinicians and researchers accessing from the system? So patients are ending up with a clinical confirmation. Clinicians are able to leverage a genomic learning system that contains feedback from research and clinical testing. And researchers are able to help their patients get access to cutting edge diagnoses. Altogether, we believe the system is improving our ability to provide genomic insights to our patients.

Outcome and future plans

So as you can tell, I really focus this talk on setting up the genomic learning system, setting up the Children’s Heart Disease Cohort’s Initiative, as this was the focus of the first phase of the project. We’re continuing to extend analysis within the genomic learning system, but it’s really early days and many analyses are ongoing. As such, we expect many more results as time progresses. Nevertheless, within those first, about a thousand patients total two thousand or so folks who were sequenced in the first year, we found on the order of hundreds of variants of interests, many of which have known gene disease, drug relationships.

I’m suggesting that there might be clinical interventions, but it’s really, as I said, at early days. But the research analysis that have come to completion, we have a number of them to have where we’ve identified clinical and had GeneDx clinically confirmed variants in patients. Some of these patients have had atypical or mild presentations who would not have typically received clinical sequencing as a diagnostic for their disease. And other patients were connected with services, for example, one family was connected to a clinical trial at a nearby hospital for which they were eligible based on their diagnosis.

And another patient was able to, with an atypical presentation, to be seen in a specialized multidisciplinary clinic.

So a lot of the vision of this project is really going into the future. I think one of the biggest things is collaboration with other pediatric institutions to drive forward therapeutics and discovery. And , a project of this breadth, is only possible with the enthusiastic collaboration of everyone here and as well as all of the researchers who are part of the different cohort teams. So this is really a cross- institutional collaboration. Thank you! 

Download Slides