Empirische Codonmodelle für comparative Sequenzdaten
View on FWF Research RadarKeywords
Research Disciplines
Darwinian selection is an important source of evolutionary innovation and a major force behind the divergence of species. Consequently, a wide variety of methods have been developed to detect genes that have been subject to selection, including comparative or phylogenetic methods that utilize patterns of substitutions between species. For example, standard likelihood ratio tests for positive selection have been developed that are based on codon substitution models. However, if applied to closely related species such as primate genomes, these tests lack power, and therefore very few genes show signs of positive selection. Incorporating additional information, such as patterns of intraspecific polymorphism, promises to improve the detection of positive selection. With the emergence of new sequencing technologies these data are now available. Indeed, for several species, including Human, Drosophila and Arabidopsis, 1,000 genomes will be available soon. However, it remains unclear whether the probabilistic methodologies previously used in phylogenetics and in population genetics are suitable to analyse these data sets. The proposed project includes both a theoretical and an applied component that will provide bioinformatic tools and biological knowledge on the evolution of protein coding genes. The theoretical component will aim at developing new empirical codon models. We will develop new algorithms to estimate empirical models that take into account substantial site and lineage specific rate variation in comparative polymorphism data (i.e., sequences from several species and multiple individuals). This part of the project will therefore be geared towards implementing the model and testing it using computer simulations and empirical data sets such as the mammalian phylogeny to identify its underlying properties. The software developped will be made available to the public as open source software. In the applied component we will use the empirical codon models and their extensions to understand the evolutionary processes on the Drosophila phylogeny. Taking advantage of the 12 Drosophila genomes data and the 1000 D. melanogaster project we will first analyse the melanogaster subgroup. Furthermore sequencing of D. ananassae populations will be performed to study another clade with different substitution patterns. Tests for Darwinian selection will be performed on both clades.
This project has no linked research outputs in the database.
No additional funding sources recorded.
Research Fields