MOL.BIOL.EVOL.AOP
Evolution of function of a fused metazoan tRNA synthetase
The origin and evolution of multidomain proteins is driven by diverse processes including fusion/fission, domain shuffling, and alternative splicing. The twenty aminoacyl-tRNA synthetases (AARS) constitute an ancient, conserved family of multidomain proteins. The glutamyl-prolyl tRNA synthetase (EPRS) of bilaterian animals is unique among AARSs, containing two functional enzymes catalyzing ligation of glutamate and proline to their cognate tRNAs. The ERS and PRS catalytic domains in multiple bilaterian taxa are linked by variable number of helix-turn-helix domains referred to as WHEP-TRS domains. In addition to its canonical aminoacylation activities, human EPRS exhibits a noncanonical function as an inflammation-responsive regulator of translation. Recently, we have shown that the WHEP domains direct this auxiliary function of human EPRS by interacting with an mRNA stem-loop element (GAIT element). Here we show EPRS is present in the cnidarian Nematostella vectensis, which pushes the origin of the fused protein back to the cnidarian-bilaterian ancestor, 50-75 million years before the origin of the Bilateria. Remarkably, the Nematostella EPRS mRNA is alternatively spliced to yield three isoforms with variable number and sequence of WHEP domains, and with distinct RNA-binding activities. Whereas one isoform containing a single WHEP domain binds tRNA, a second binds both tRNA and GAIT element RNA. However, the third isoform contains two WHEP domains, and like the human ortholog binds specifically to GAIT element RNA. These results suggest that alternative splicing of WHEP domains in the EPRS gene of the cnidarian-bilaterian ancestor gave rise to a novel molecular function of EPRS conserved during metazoan evolution.
Molecular evolution of the endosperm starch synthesis pathway genes in rice (Oryza sativa L.) and its wild ancestor, O. rufipogon L.
The evolution of metabolic pathways is a fundamental but poorly understood aspect of evolutionary change. One approach for understanding the complexity of pathway evolution is to examine the molecular evolution of genes which together comprise an integrated metabolic pathway. The rice endosperm starch biosynthetic pathway is one of the most thoroughly characterized metabolic pathways in plants and starch is a trait that has evolved in response to strong selection during rice domestication. In this study we have examined six key genes (AGPL2, AGPS2b, SSIIa, SBEIIb, GBSSI, ISA1) in the rice endosperm starch biosynthesis pathway to investigate the evolution of these genes before and after rice domestication. Genome-wide sequence tagged sites data were used as a neutral reference to overcome the problems of detecting selection in species with complex demographic histories such as rice. Five variety groups of Oryza sativa (aus, indica, tropical japonica, temperate japonica, aromatic) and its wild ancestor (O. rufipogon) were sampled. Our results showed evidence of purifying selection at AGPL2 in O. rufipogon and strong evidence of positive selection at GBSSI in temperate japonica and tropical japonica varieties, and at GBSSI and SBEIIb in aromatic varieties. All the other genes showed a pattern consistent with neutral evolution in both cultivated rice and its wild ancestor. These results indicate the important role of positive selection in the evolution of starch genes during rice domestication. We discuss the role of SBEIIb and GBSSI in the evolution of starch quality during rice domestication, and the power and limitation of detecting selection using genome-wide data as a neutral reference.
Transcriptomic evidence that longevity of acquired plastids in the photosynthetic slugs Elysia timida and Plakobrachus ocellatus does not entail lateral transfer of algal nuclear genes
Sacoglossan sea slugs are unique in the animal kingdom in that they sequester and maintain active plastids that they acquire from the siphonaceous algae upon which they feed, making the animals photosynthetic. While most sacoglossan species digest their freshly ingested plastids within hours, four species from the family Plakobranchidae retain their stolen plastids (kleptoplasts) in a photosynthetically active state on time scales of weeks to months. The molecular basis of plastid maintenance within the cytosol of digestive gland cells in these photosynthetic metazoans is yet unknown, but is widely thought to involve gene transfer from the algal food source to the slugs based upon previous investigations of single genes. Indeed, normal plastid development requires hundreds of nuclear-encoded proteins, with protein turnover in photosystem II in particular known to be rapid under various conditions. Moreover, only algal plastids, not the algal nuclei, are sequestered by the animals during feeding. If algal nuclear genes are transferred to the animal either during feeding or in the germ line, and if they are expressed, then they should be readily detectable with deep-sequencing methods. We have sequenced expressed mRNAs from actively photosynthesizing, starved individuals of two photosynthetic sea slug species, Plakobranchus ocellatus Van Hasselt, 1824 and Elysia timida Risso, 1818. We find that nuclear-encoded, algal-derived genes specific to photosynthetic function are expressed neither in P. ocellatus nor in E. timida. Despite their dramatic plastid longevity, these photosynthetic sacoglossan slugs do not express genes acquired from algal nuclei in order to maintain plastid function.
Origin of clothing lice indicates early clothing use by anatomically modern humans in Africa
Clothing use is an important modern behavior that contributed to the successful expansion of humans into higher latitudes and cold climates. Previous research suggests that clothing use originated anywhere between 40,000 and 3 million years ago, though there is little direct archaeological, fossil, or genetic evidence to support more specific estimates. Since clothing lice evolved from head louse ancestors once humans adopted clothing, dating the emergence of clothing lice may provide more specific estimates of the origin of clothing use. Here, we use a Bayesian coalescent modeling approach to estimate that clothing lice diverged from head louse ancestors at least by 83,000 and possibly as early as 170,000 years ago. Our analysis suggests that the use of clothing likely originated with anatomically modern humans in Africa and reinforces a broad trend of modern human developments in Africa during the Middle to Late Pleistocene.
Remarkable Abundance and Evolution of Mobile group II Introns in Wolbachia Bacterial Endosymbionts
The streamlined genomes of ancient obligate endosymbionts generally lack transposable elements, as a consequence of their intracellular confinement. Yet, the genomes of Wolbachia, one of the most abundant bacterial endosymbionts on Earth, are littered with transposable elements, in particular insertion sequences. This paradox raises the question of whether or not such a mobile DNA proliferation reflects a special feature of insertion sequences. In this study, we focused on another class of transposable elements, group II introns, and conducted an in-depth analysis of their content and the microevolutionary processes responsible for their dynamics within Wolbachia genomes. We report an exceptionally high intron abundance, and striking differences in copy numbers between Wolbachia strains, as well as between intron families. Our bioinformatics and experimental results provide strong evidence that intron diversity is mainly caused by recent (and perhaps ongoing) mobility and horizontal transfers. Our data also support several temporally independent intron invasions during Wolbachia evolution. Furthermore, group II intron spread in some Wolbachia strains may be regulated through gene conversion-mediated inactivation of intron copies. Finally, we found introns to be involved in numerous genomic rearrangements. This underscores the high recombinogenic potential of group II introns, contrary to general expectations. Overall, our study represents the first comprehensive analysis of group II intron evolutionary dynamics in obligate intracellular bacteria. Our results show that bacterial endosymbionts with reduced genomes can sustain high loads of mobile group II introns, as hypothesized for the endosymbiont ancestor of mitochondria during early eukaryote evolution.
Reconstructing Population Histories from Single-Nucleotide Polymorphism Data
Population genetics encompasses a strong theoretical and applied research tradition on the multiple demographic processes that shape genetic variation present within a species. When several distinct populations exist in the current generation, it is often natural to consider the pattern of their divergence from a single ancestral population in terms of a binary tree structure. Inference about such population histories based on molecular data has been an intensive research topic in the recent years. The most common approach uses coalescent theory to model genealogies of individuals sampled from the current populations. Such methods are able to compare several different evolutionary scenarios and to estimate demographic parameters. However, their major limitation is the enormous computational complexity associated with the indirect modelling of the demographies, which limits the application to small data sets. Here we propose a novel Bayesian method for inferring population histories from unlinked single-nucleotide polymorphisms, which is applicable also to datasets harboring large numbers of individuals from distinct populations. We use an approximation to the neutral Wright-Fisher diffusion to model random fluctuations in allele frequencies. The population histories are modelled as binary rooted trees that represent the historical order of divergence of the different populations. A combination of analytical, numerical and Monte Carlo integration techniques are utilized for the inferences. A particularly important feature of our approach is that it provides intuitive measures of statistical uncertainty related with the estimates computed, which may be entirely lacking for the alternative methods in this context. The potential of our approach is illustrated by analyses of both simulated and real data sets.
New Insights into the Evolution of Metazoan Cadherins
Mining newly sequenced genomes of basal metazoan organisms reveals the evolutionary origin of modern protein families. Specific cell-cell adhesion and intracellular communication are key processes in multicellular animals, and members of the cadherin superfamily are essential players in these processes. Mammalian genomes contain over 100 genes belonging to this superfamily. By a combination of tBLASTn and profile HMM analyses we made an exhaustive search for cadherins and compiled the cadherin repertoires in key organisms, including Branchiostoma floridae (amphioxus), the sea anemone Nematostella vectensis and the placozoan Trichoplax adhaerens. Comparative analyses of multiple protein domains within known and novel cadherins enabled us to reconstruct the complex evolution in metazoa of this large superfamily. Five main cadherin branches are represented in the primitive metazoan Trichoplax: classical (CDH), flamingo (CELSR), dachsous (DCHS), FAT and FAT-like. Classical cadherins, such as E-cadherin, arose from an Urmetazoan cadherin, which progressively lost N-terminal extracellular cadherin repeats while its cytoplasmic domain, which binds the armadillo proteins p120ctn and β-catenin, remained quite conserved from placozoa to man. The origin of protocadherins predates the Bilateria and is likely rooted in an ancestral FAT cadherin. Several but not all protostomians lost protocadherins. The emergence of chordates coincided with a great expansionof the protocadherin repertoire. The evolution of ancient metazoan cadherins points to their unique and crucial roles in multicellular animal life.
GC3 of genes can be used as a proxy for isochore base composition: A reply to Elhaik et al.
In an article published in these pages, Elhaik et al. (2009, Mol Biol Evol. 26:1829) asked if GC3, the GC level of third codon positions in protein-coding genes, can be used as a ‘proxy’ to estimate the GC level of the surrounding isochore. We use available data to directly answer this simple question in the affirmative, and show how the use of indirect methods can lead to apparently conflicting conclusions. The answer reasserts that in human and other vertebrates, genes have a strong tendency to reside in compositionally corresponding isochores, which has far-reaching implications for genome structure and evolution.
Comparative proteomics uncovers the signature of natural selection acting on the ejaculate proteomes of two cricket species isolated by postmating, prezygotic phenotypes
Two of the most well-supported patterns to have emerged over the past two decades of research in evolutionary biology are the occurrence of divergent natural selection acting on many male and female reproductive tract proteins and the importance of postmating, prezygotic phenotypes in reproductively isolating closely related species. Although these patterns appear to be common across a wide variety of taxa, the link between them remains poorly documented. Here, we utilize comparative proteomic techniques to determine whether or not there is evidence for natural selection acting on the ejaculate proteomes of two cricket species (Allonemobius fasciatus and A. socius) which are reproductively isolated primarily by postmating, prezygotic phenotypes. In addressing this question, we compare the degree of within-species polymorphism and between-species divergence between the ejaculate and thorax proteomes of these two species. We found that the ejaculate proteomes are both less polymorphic and more divergent than the thorax proteomes. Additionally, we assessed patterns of nucleotide variation for two species-specific ejaculate proteins and found evidence for both reduced levels of variation within species and positive selection driving divergence between species. In contrast, non-species-specific proteins exhibited higher levels of within species nucleotide variation and no signatures of positive selection. Nucleotide and putative functional data for the two species-specific proteins, along with data for a third protein (Ejaculate Serine Protease), suggest that all three of these genes are candidate speciation genes in need of further study. Overall, these patterns of proteome and nucleotide divergence provide support for the hypothesis that there is a causative link between selection-driven divergence of male ejaculate proteins and the evolution of postmating, prezygotic barriers to gene flow within Allonemobius.
MicroRNA Networks Alter to Conform to Transcription Factor Networks Adding Redundancy and Reducing the Repertoire of Target Genes for Coordinated Regulation
Transcription factors (TFs) and microRNAs (miRNAs) comprise two major layers of gene regulatory networks (GRNs). TFs and miRNAs function coordinately, but they have distinct molecular mechanisms and evolutionary backgrounds. Therefore, we aimed to systematically reveal the difference in contribution between TF and miRNA networks to the evolution of their coordinated regulations by focusing on composite feedforward circuits (cFFCs), that each comprises a TF and an miRNA. We compiled 124736 human-mouse conserved TF regulatory connections and 34298 conserved miRNA regulatory connections into two distinct connection matrices. To differentially assess the contributions to cFFC formation of TFs and miRNAs, we randomized one matrix and kept the other unchanged, and subsequently examined the number of cFFCs, the number of cFFC-targeted genes, and the redundancy formed by cFFCs in comparison with those of the real GRNs. Since the matrices represent selectively constrained networks, if selection has been operating on the networks for or against cFFC formation, the values of cFFC network properties would deviate significantly from the expectation of the randomized networks. As the cFFC includes both TF and miRNA connections, the partial randomizations indicate the extent of influence of selection on cFFC formation differentially between TF and miRNA networks. Thus, we adopted the deviation of each cFFC network property value as a measure to estimate the extent of influence of selection on cFFCs and to compare the contribution between TF and miRNA networks. We found that miRNA regulatory networks changed their configuration such that they conformed to the stable TF regulatory networks with an increased circuit redundancy and a marked reduction in the repertoire of cFFC-targeted genes. We also revealed that this redundancy-adding role is preferentially attributable to miRNA network alterations. The results indicate that the redundancy-adding role might serve as a niche for many miRNA connections to survive, avoiding conflicts with the stable TF regulatory networks.
Extreme reconfiguration of plastid genomes in the angiosperm family Geraniaceae: rearrangements, repeats, and codon usage
Geraniaceae plastid genomes (plastomes) have experienced a remarkable number of genomic changes. The plastomes of Erodium texanum, Geranium palmatum, and Monsonia speciosa were sequenced and compared to other rosids and the previously published Pelargonium hortorum plastome. Geraniaceae plastomes were found to be highly variable in size, gene content and order, repetitive DNA, and codon usage. Several unique plastome rearrangements include the disruption of two highly conserved operons (S10 and rps2-atpA), and the inverted repeat region (IR) in M. speciosa does not contain all genes in the rRNA operon. The sequence of M. speciosa is unusually small (128,787 bp); among angiosperm plastomes sequenced to date, only those of nonphotosynthetic species and those that have lost one IR copy are smaller. In contrast, the plastome of P. hortorum is the largest, at 217,942 bp. These genomes have experienced numerous gene and intron losses and partial and complete gene duplications. Some of the losses are shared throughout the family (e.g., trnT-GGU and the introns of rps16 and rpl16); however, other losses are homoplasious (e.g., trnG-UCC intron in G. palmatum and M. speciosa). IR length is also highly variable. The IR in P. hortorum was previously shown to be greatly expanded to 76 kb, and the IR is lost in E. texanum and reduced in G. palmatum (11 kb) and M. speciosa (7 kb). Geraniaceae plastomes contain a high frequency of large repeats (>100 bp) relative to other rosids. Within each plastome, repeats are often located at rearrangement endpoints, and many repeats shared among the four Geraniaceae flank rearrangement endpoints. GC content is elevated in the genomes and also in coding regions relative to other rosids. Codon usage per amino acid and GC content at third position sites are significantly different for Geraniaceae protein-coding sequences relative to other rosids. Our findings suggest that relaxed selection and/or mutational biases lead to increased GC content and this in turn altered codon usage. We propose that increases in genomic rearrangements, repetitive DNA, nucleotide substitutions, and GC content may be caused by relaxed selection resulting from improper DNA repair.
A Universal Molecular Clock of Protein Folds and its Power in Tracing the Early History of Aerobic Metabolism and Planet Oxygenation
The standard molecular clock describes a constant rate of molecular evolution and provides a powerful framework for evolutionary timescales. Here we describe the existence and implications of a molecular clock of folds, a universal recurrence in the discovery of new structures in the world of proteins. Using a phylogenomic structural census in hundreds of proteomes we build phylogenies and timelines of domains at fold and fold superfamily levels of structural complexity. These timelines correlate approximately linearly with geological timescales and were here used to date two crucial events in life history, planet oxygenation and organism diversification. We first dissected the structures and functions of enzymes in simulated metabolic networks. The placement of anaerobic and aerobic enzymes in the timeline revealed that aerobic metabolism emerged ~2.9 billion years (Ga) ago and expanded during a period of ~400 million years, reaching what is known as the Great Oxidation Event. During this period, enzymes recruited old and new folds for oxygen-mediated enzymatic activities. Remarkably, the first fold lost by a superkingdom disappeared in Archaea 2.6 Ga ago, within the span of oxygen rise, suggesting oxygen also triggered diversification of life. The implications of a molecular clock of folds are many and important for the neutral theory of molecular evolution and for understanding the growth and diversity of the protein world. The clock also extends the standard concept that was specific to molecules and their timescales and turns it into a universal timescale-generating tool.
Population Genetic Analysis of the Uncoupling Proteins Supports a Role for UCP3 in Human Cold Resistance
Production of heat via non-shivering thermogenesis (NST) is critical for temperature homeostasis in mammals. Uncoupling protein UCP1 plays a central role in NST by uncoupling the proton gradients produced in the inner membranes of mitochondria to produce heat; however, the extent to which UCP1 homologues, UCP2 and UCP3, are involved in NST is the subject of an ongoing debate. We used an evolutionary approach to test the hypotheses that variants that are associated with increased expression of these genes (UCP1 -3826A, UCP2 -866A and UCP3 -55T) show evidence of adaptation with winter climate. To that end, we calculated correlations between allele frequencies and winter climate variables for these SNPs, which we genotyped in a panel of 52 worldwide populations. We found significant correlations with winter climate for UCP1 -3826 G/A and UCP3 -55 C/T. Further, by analyzing previously published genotype data for these SNPs we found that the peak of the correlation for the UCP1 region occurred at the disease associated -3826A/G variant and that the UCP3 region has a striking signal overall, with several individual SNPs showing interesting patterns, including the -55 C/T variant. Re-sequencing of the regions in a set of 3 diverse population samples helped to clarify the signals that we found with the genotype data. At UCP1, the re-sequencing data revealed modest evidence that the haplotype carrying the -3826A variant was driven to high frequency by selection. In the UCP3 region, combining results from the climate analysis and re-sequencing survey suggests a more complex model in which variants on multiple haplotypes may independently be correlated with temperature. This is further supported by an excess of intermediate frequency variants in the UCP3 region in the Han Chinese population. Taken together, our results suggest that adaptation to climate influenced the global distribution of allele frequencies in UCP1 and UCP3, and provide an independent source of evidence for a role in cold resistance for UCP3.
Evolutionary history of chimpanzees inferred from complete mitochondrial genomes
Investigations into the evolutionary history of the common chimpanzee, Pan troglodytes, have produced inconsistent results, due to differences in the types of molecular data considered, the model assumptions employed, and the quantity and geographical range of samples used. We amplified and sequenced 24 complete P. troglodytes mitochondrial genomes from fecal samples collected at multiple study sites throughout sub-Saharan Africa. Using a ‘relaxed molecular clock,’ fossil calibrations, and 12 additional complete primate mitochondrial genomes, we analyzed the pattern and timing of primate diversification in a Bayesian framework. Our results support the recognition of four chimpanzee subspecies. Within P. troglodytes, we report a mean (95% highest posterior density (HPD)) time since most recent common ancestor (tMRCA) of 1.026 (0.811-1.263) MYA for the four proposed subspecies, with two major lineages. One of these lineages (tMRCA = 0.510 [0.387-0.650] MYA) contains P. t. verus (tMRCA = 0.155 [0.101-0.213] MYA) and P. t. ellioti (formerly P. t. vellerosus; tMRCA = 0.157 [0.102-0.215] MYA), both of which are monophyletic. The other major lineage contains P. t. schweinfurthii (tMRCA = 0.111 [0.077-0.146] MYA), a monophyletic clade nested within the P. t. troglodytes lineage (tMRCA = 0.380 [0.296-0.476] MYA). We utilized two analysis techniques that may be of widespread interest. First, we implemented a Yule speciation prior across the entire primate tree with separate coalescent priors on each of the chimpanzee subspecies. The validity of this approach was confirmed by estimates based on more traditional techniques. We also suggest that accurate tMRCA estimates from large, computationally difficult sequence alignments may be obtained by implementing our novel method of bootstrapping smaller, randomly sub-sampled alignments.
Choosing among partition models in Bayesian phylogenetics
Bayesian phylogenetic analyses often depend on Bayes factors (BF) to determine the optimal way to partition the data. The marginal likelihoods used to compute Bayes factors, in turn, are most commonly estimated using the harmonic mean (HM) method, which has been shown to be inaccurate. We describe a new, more accurate method for estimating the marginal likelihood of a model and compare it to the HM method on both simulated and empirical data. The new method generalizes our previously-described stepping-stone (SS) approach by making use of a reference distribution parameterized using samples from the posterior distribution. This avoids one challenging aspect of the original SS method, namely the need to sample from distributions that are close (in the Kullback-Leibler sense) to the prior. We specifically address the choice of partition models, and find that using the HM method can lead to a strong preference for an overpartitioned model. In contrast to the HM method and the original SS method, we show using simulated data that the generalized SS method is strikingly more precise (repeatable BF values of the same data and partition model) and yields BF values that are much more reasonable than those produced by the HM method. Comparisons of HM and generalized SS methods on an empirical data set demonstrate that the generalized SS method tends to choose simpler partition schemes that are more in line with expectation based on inferred patterns of molecular evolution. The generalized SS method shares with thermodynamic integration the need to sample from a series of distributions in addition to the posterior. Such dedicated path-based Markov chain Monte Carlo (MCMC) analyses appear to be a cost of estimating marginal likelihoods accurately.
Combining comparative sequence and genomic data to ascertain phylogenetic relationships and explore the evolution of the large GDSL-lipase family in land-plants
The GDSL-lipase gene family is a very large sub-family within the super gene family of SGNH esterases, defined by the distinct GDSL amino acid motif and several highly conserved domains. Plants retain a large number of GDSL-lipases indicating that they have acquired important functions. Yet, in planta functions have been demonstrated for only a few GDSL-lipases from diverse species. Considering that orthologs often retain equivalent functions, we determined the phylogentic relationships between GDSL-lipases from genome-sequenced species representing bryophytes, gymnosperms, monocots and eudicots. An unrooted phylogenetic tree was constructed from the amino acid sequences of 604 GDSL-lipases from seven species. The topology of the tree depicts two major and one minor sub-family. This division is also supported by the unique gene structure of each sub-family. Since GDSL-lipase genes of all species are present in each of the three sub-families, we conclude that the last common ancestor of the land plants already possessed at least one ancestral GDSL-lipase gene of each sub-family. Combined gene structure and synteny analyses revealed events of segmental duplications, gene transposition, and gene degeneration in the evolution of the GDSL-lipase gene family. Furthermore, these analyses showed that independent events of intron gain and loss also contributed to the extant repertoire of the GDSL-lipase gene family. Our findings suggest that underlying many of the intron losses was a spliceosomal-mediated mechanism followed by gene conversion. Sorting the phylogentic relationships among the members of the GDSL-lipase gene family, as depicted by the tree and supported by synteny analyses, provides a framework for extrapolation of demonstrated functional data to GDSL-lipases whose function is yet unknown. Furthermore, function(s) associated with specific lineage(s)-enriched branches may reveal correlations between acquired and/or lost functions and speciation.
Elevated evolutionary rate in genes with homopolymeric amino acid repeats constituting non-disordered structure
Homopolymeric amino acid repeats are tandem repeats of single amino acids. About 650 genes are known to have repeats of this kind comprising seven residues or more in the human genome. According to the evolutionary conservativeness, we classified the repeats into three categories: those whose length is conserved among mammals (CM), those whose length differs among nonprimate mammals but is conserved among primates (CP), and those whose length differs among primates (VP). The frequency of each repeat, especially Ala, Leu, Pro, and Glu repeats, varies greatly in each category. The three-dimensional structure of homopolymeric amino acid repeats is considered to be intrinsically disordered. As expected, a large proportion of the repeats had a disordered structure, and nearly half of the repeats were predicted as completely disordered. However, a number of the repeats predicted to have non-disordered structure: 13 and 25 % of the repeats for categories CM and VP, respectively. Comparison of the substitution rates showed a higher Ka/Ks ratio for the genes with not disordered repeats than the genes with disordered repeats. These results indicate that amino acid substitution rates have been elevated in the genes with non-disordered repeats.
A Comprehensive Functional Analysis of Ancestral Human Signal Peptides
With the sequencing of the Neandertal genome, it has become possible to identify amino acid substitutions that occurred on the human lineage since its separation from the Neandertal lineage. Conceptually, it will therefore be possible to functionally analyze all such amino acid substitutions in the future. Here we analyze the function of substitutions that occurred during recent human evolution in N-terminal signal peptides. We develop a high-throughput flow cytometry-based assay to analyze signal peptide efficiency as the ratio of surface to total reporter protein per live cell. Such ratios differed significantly among signal peptides derived from different human genes. However, no modern human signal peptide differed significantly from its ancestral counterpart, an observation compatible with the predictions of the neutral theory of molecular evolution.
Functional compensation of primary and secondary metabolites by duplicate genes in Arabidopsis thaliana
It is well known that knocking out a gene in an organism often causes no phenotypic effect. One possible explanation is the existence of duplicate genes; that is, the effect of knocking out a gene is compensated by a duplicate copy. Another explanation is the existence of alternative pathways. In terms of metabolic products, the relative roles of the two mechanisms have been extensively studied in yeast, but not in any multicellular organisms. Here, to address the functional compensation of metabolic products by duplicate genes, we quantified 35 metabolic products from 1,976 genes in knock-out mutants of Arabidopsis thaliana by a high-throughput LC-MS analysis. We found that knocking out either a singleton gene or a duplicate gene with distant paralogs in the genome tends to induce stronger metabolic effects than knocking out a duplicate gene with a close paralog in the genome, indicating that only duplicate genes with close paralogs play a significant role in functional compensation for metabolic products in A. thaliana. To extend the analysis, we examined metabolic products with either high or low connectivity in a metabolic network. We found that the compensatory role of duplicate genes is less important when the metabolite has a high connectivity, indicating that functional compensation by alternative pathways is common in the case of high connectivity. In conclusion, recently duplicated genes play an important role in the compensation of metabolic products only when the number of alternative pathways is small.
Phylogenetic substitution models for detecting heterotachy during plastid evolution
There is widespread evidence of lineage-specific rate variation, known as heterotachy, during protein evolution. Changes in the structural and functional constraints acting on a protein can lead to heterotachy, and it is plausible that such changes, known as covarion shifts, may affect many amino acids at once. Several previous attempts to model heterotachy have used covarion models, where the sequence undergoes covarion drift, whereby each site may switch independently among a set of discrete classes having different substitution rates. However, such independent switching may not capture biologically important events where the selective forces acting on a protein affect many sites at once. We describe a new class of models that allow the rates of substitution and switching to vary among branches of a phylogenetic tree. Such models are better able to handle covarion shifts. We apply these models to a set of genes occurring in non-photosynthetic bacteria, cyanobacteria, and the plastids of green and red algae. We find that 4/5 genes show evidence of some form of rate switching and that 3/5 genes show evidence that the relative switching rate differs among taxonomic groups. We conclude that covarion shifts may be frequent during the deep evolution of plastid genes and that our methodology may provide a powerful new tool for investigating such shifts in other systems.