Summary Conclusion: Genome Wide Copy Number Variations and ET

Grant Title: Genome Wide Copy Number Variations and ET: a high-density genotyping study
Funding Period: 07/2017 to 07/2018

Principal Investigator: Guy A. Rouleau, McGill University
Montreal Neurological Hospital and Institute

Introduction:
Essential tremor (ET) is a common neurological disorder characterized by postural, kinetic and intention tremors of a body part. Like many other common and complex human diseases and traits, ET is known to cluster in families and it is believed to be influenced by several genetic and environmental factors. Over the past few years, independent studies have suggested the existence of several genetic risk factors and loci associated ET. While such studies have provided valuable insights in regard to the genetic architecture of ET, genetic variants from these risk factors still explain only a small proportion of cases. Ultimately, a better understanding of ET genetic will contribute to improved prevention, diagnosis and open investigation for a much needed treatment of the disease.

Genetic variants that are associated, or the cause, of ET have been sought for over a decade but the involvement of an emerging class of genetic variant that is collectively referred to as “Copy Number Variant” (CNV) has not yet been properly examined in familial ET. The presence of inherited CNV has been proposed to represent a predisposition factor in a number of complex conditions (e.g. schizophrenia1,2, autism3,4 and Alzheimer’s disease5,6); it is noteworthy that various studies have established the heritability of CNV7,8. Given the highly heritable nature of ET, we hypothesize that individually rare structural variants, like these CNV, might segregate with the phenotype within ET families and as such represent novel risk factors to be considered.

Method:
A total of 130 samples for 13 different multi-generational families were selected. Genomic DNAs were genotyped using the Human Omni2.5 Exome BeadChip array. The raw data was at first be processed using the Illumina GenomeStudio software in order to export the signal intensity data from a genotyping project to a text file. A total of 2,608,742 markers were capture per sample with a success rate >98% for all of them.

Copy number variants (CNV) were called using two different software, PennCNV9 and QuantiSNP10 in order to increase sensitivity. A first set of variant call was produced using the default parameters of both software but given the highly repetitive nature of some DNA regions we observed a lot of fall positives CNV (in particular across GC rich regions).

We therefore re-performed a second set of CNV calls but this time we adjusted the signal intensity values to a mode that would consider GC corrections; although the causes of genomic waves are not well-understood they may prevent accurate inference of CNV. Quality control metrics were calculated and thresholds for each measurement were chosen by examining their distribution for the obvious outliers. Poor quality samples were excluded from the study if they failed any of the stringent quality controls metric. Only reliable CNV calls were keep for further analysis.

For PennCNV, CNV were called and quality metrics calculated using “detect_cnv.pl” correcting for GC content bias using GC content information from UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gc5Base.txt.gz). Thresholds for each measurement were chosen by examining their distribution for the obvious outliers. Samples were excluded based on the following basis: (i) a log R ratio standard deviation (LRR) >0.18, (ii) the B allele frequency standard deviation (BAF) >0.001, or (iii) an absolute waviness factor (WF) >0.03.

Overall, 3% of samples failed one of these quality metrics, resulting in 4 out of the original 130 samples to be excluded from the final dataset. A merging of adjacent calls together into one single call was also performed if the gap between two CNV was <20% of the total CNV call length.

Additionally, only CNV calls containing >10 SNP and having a confidence score >10 were selected.

For QuantiSNP, CNV were called using “run_quantisnp2.sh” correcting for local GC content. Similar to PennCNV, low quality data was detected by examining quality metrics parameters: (i) LRR > 0.15, (ii) BAF > 0.4, or (iii) WF > 0.002. In total, five samples were excluded based on their outlier quality metrics; of those, four were already excluded by PennCNV. Finally, CNV calls with a Log Bayes Factor confidence value <10 were filtered out of the analysis.

Results:
The use of PennCNV enabled a total of 20,087 CNV calls across our ET samples. Of those, 1,489 were detected as deletions and 18,598 were duplications. Between 13 and 773 CNV were detected for each sample, with a mean of 154.5 kb (sizes of these CNV ranged between 20 bp and 939 kb).

In parallel the use of QuantiSNP detected a total of 11,580 CNVs, ranging from 9 to 1,794 bp per sample, with a mean of 90. Of those, 249 were deletions and 11,330 were duplications. CNV call length varied between 42bp to 243Mb, averaging 14Mp.

Overall, PennCNV seem to detect more CNV of smaller lengths, whereas QuantiSNP detected fewer CNVs but of larger sizes. Using a homemade python script, calls from both datasets were merged to establish if any overlapping CNV could be identified. In total, 31,667 CNV passed quality control, of which only 10% (3,241) were consensual between the two software, which is striking but not unusual11.

Since the aim of our proposal was to test if inherited CNV could be identified, we used a custom python script to examine if any CNV calls were shared by all the affected members of a single ET family (this was tested for all 13 families used in the study). CNV were merged into a single call when at least 50% of their sequence overlapped. Of the 28,426 single CNV detected, only 279 were shared by all affected members of a family. While the majority of these segregating CNV did encompass a gene, all of them were also observed to be common in public databases.

Conclusion:
We performed a genome-wide analysis of CNV in 130 samples for 13 different multi-generational families. Our analysis detected CNV inherited by all affected members; however, those were previously reported in public databases, making them unlikely to be the genetic cause of ET in the families studied.

The analysis of the genome-wide patterns of segregating CNV across a set of individuals is a challenging task, as illustrated by the poor consensus of calls between the two software used. The quality control criteria applied in our study may have been too stringent and this could have led to the exclusion of samples that could have produced additionally informative CNV data; to our knowledge there is no consensus regarding a ideally-defined set of quality control criteria regarding the inclusion of samples for CNV analyses such as the one we made.

While our study did not successfully highlight individually rare and inherited structural variants that could be considered to be genetic risk factors for ET, it could have been limited in part by the relatively small number of families studied. It is important to remember that pathogenic CNV tend to by rare by nature and as such it is never an easy task to estimate their genetic contribution to disease. Nevertheless, exploratory studies such as the one we did are necessary and while this was a first examination of CNV in large multiples ET families, such an endeavour will warrant future effort as genotyping, whole genome sequencing and bioinformatic tools for the prediction of CNV will both improve and refine the detection of these critical structural variants. IETF final Report (2018) Dr Guy A. Rouleau June 8 2018

References:

  1. Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539-43 (2008).
  2. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455, 237-41 (2008).
  3. Pinto, D. et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368-72 (2010).
  4. Glessner, J.T. et al. Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 459, 569-73 (2009).
  5. Swaminathan, S. et al. Analysis of copy number variation in Alzheimer’s disease: the NIALOAD/ NCRAD Family Study. Curr Alzheimer Res 9, 801-14 (2012).
  6. Hooli, B.V. et al. Rare autosomal copy number variations in early-onset familial Alzheimer’s disease. Mol Psychiatry 19, 676-81 (2014).
  7. Locke, D.P. et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am J Hum Genet 79, 275-90 (2006).
  8. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444-54 (2006).
  9. Wang, K. et al. PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17, 1665-74 (2007).
  10. Colella, S. et al. QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35, 2013-25 (2007).
  11. Buizer-Voskamp, J.E. et al. Genome-wide analysis shows increased frequency of copy number variation deletions in Dutch schizophrenia patients. Biological psychiatry 70, 655-662 (2011).