Analyse von seltenen Varianten in Sequenzierdaten
View on FWF Research RadarKeywords
Research Disciplines
Research Fields
Today, many large-scale sequencing studies are on the way, addressing one of the major questions in human genetics: how and to what extent can insights into disease etiology be advanced by studying low frequency variants. The development of analytical tools, however, is barely keeping up with the deluge of human sequencing data. For example, single-SNP disease associations are commonly tested based on logistic regression. This approach is powerful for common variant and therefore broadly used in GWAS, but for studying the association of rare variants our power to detect signals will be modest. One possibility is to assess the combined effects of specific sets of rare variants: for example, all coding variants in a particular gene. These burden tests take into account overall variant-load within specified genomic regions of interest and are, therefore, better able to detect signals in the presence of multiple rare causal alleles. This is a very active area of research: within the last three years more than 20 burden tests have been proposed. However, the properties of these tests are still not fully understood and the comparisons provided in the original publications are often too simplistic or cover only a small range of genetic architectures. Furthermore, the few published method-neutral comparisons, have used simulations that do not reflect the properties of real data (e.g. excess of singletons beyond neutral expectations) or are not covering a wide range of methods. Therefore, analysts of sequence data have to make best-guess decisions when choosing a rare variant analysis method to address certain genetic hypotheses. Therefore, aim 1 of this project is to fill this gap by performing an extensive method neutral evaluation of different burden tests based on realistic sequence data. Our results will guide investigators to identify the most powerful approach to identify rare variants associated with disease. One interesting feature of burden tests is the integration of functional information at gene or locus level. A logical next step in mining genome-wide sequence data is to analyze them at gene set or even at the pathway level. For common variants gene set enrichment analysis (GSEA) is broadly used to test if pathways are enriched. Aim 2 of the proposed project is to extend GSEA to take full advantage of sequence specific properties, such as extensive ascertainment of rare variants, and compare power to the extended burden test approach outlined above. In Aim 3, we will further extend the method proposed in Aim 2 by taking into account the a-priori known relationships between genes and variants. Completion of these three aims will result in research tools of high strategic value and impact, and will enhance the value of many ongoing and future large-scale sequencing experiments.
This project has no linked research outputs in the database.
No additional funding sources recorded.