Genome Biology & Evolution



Genome Biology and Evolution (GBE) publishes leading original research at the interface between evolutionary biology and genomics. Papers considered for publication report novel evolutionary findings that concern natural genome diversity, population genomics, the structure, function, organisation and expression of genomes, comparative genomics, proteomics, and environmental genomic interactions. Major evolutionary insights from the fields of computational biology, structural biology, developmental biology, and cell biology are also considered, as are theoretical advances in the field of genome evolution. GBE’s scope embraces genome-wide evolutionary investigations at all taxonomic levels and for all forms of life — within populations or across domains. Its aims are to further the understanding of genomes in their evolutionary context and further the understanding of evolution from a genome-wide perspective.

GBE is owned by the Society for Molecular Biology and Evolution (SMBE). Motivated by the continued growth of the field, SMBE conducted a grass-roots survey in 2007 to investigate the needs of the field regarding new publication outlets. The survey elicited a resounding response from members of SMBE and other scientists in the fields of genomics and molecular evolution. The key findings from that survey were that the field wanted an on-line only journal that was devoted specifically to the areas of genome evolution and comparative genomics and that was published under an Open Access model. The response of SMBE was to launch GBE in order to serve those needs of the field. The SMBE meeting attracts about 800 participants each year. As a reflection of the rapid growth of genomic technologies, about half of the science presented at each SMBE meeting is about genomics. With the help of the evolutionary expertise that is gathered in SMBE, GBE is positioned and designed to set the highest standards for papers in the growing field of evolutionary genomics.

GBE is open access and does not require a SMBE membership for journal access, though members receive a 10% discount on publications charges. 

Please click "Read GBE" below to access the journal.



Submit your manuscript online »



Read GBE »

@OfficialSMBE Feed

MBE | Most Read

Molecular Biology and Evolution

Evolution on the Vine: A History of Tomato Domestication in Latin America

Fri, 20 Mar 2020 00:00:00 GMT

The common cultivated tomato (Solanum lycopersicum L. var. lycopersicum; or [SLL]) is among the world’s most widely grown vegetable crops, from big agricultural farms to heirloom grown varieties.

Genomic Study Reveals Rich Pre-Hispanic History and Genetic Changes among Diverse Indigenous Mexican Populations

Fri, 20 Mar 2020 00:00:00 GMT

As more and more large-scale human genome sequencing projects get completed, scientists have been able to trace with increasing confidence both the geographical movements and underlying genetic variation of human populations.

Genomic Evidence for Complex Domestication History of the Cultivated Tomato in Latin America

Tue, 07 Jan 2020 00:00:00 GMT

Abstract
The process of plant domestication is often protracted, involving underexplored intermediate stages with important implications for the evolutionary trajectories of domestication traits. Previously, tomato domestication history has been thought to involve two major transitions: one from wild Solanum pimpinellifolium L. to a semidomesticated intermediate, S. lycopersicum L. var. cerasiforme (SLC) in South America, and a second transition from SLC to fully domesticated S. lycopersicum L. var. lycopersicum in Mesoamerica. In this study, we employ population genomic methods to reconstruct tomato domestication history, focusing on the evolutionary changes occurring in the intermediate stages. Our results suggest that the origin of SLC may predate domestication, and that many traits considered typical of cultivated tomatoes arose in South American SLC, but were lost or diminished once these partially domesticated forms spread northward. These traits were then likely reselected in a convergent fashion in the common cultivated tomato, prior to its expansion around the world. Based on these findings, we reveal complexities in the intermediate stage of tomato domestication and provide insight on trajectories of genes and phenotypes involved in tomato domestication syndrome. Our results also allow us to identify underexplored germplasm that harbors useful alleles for crop improvement.

Molecular Evolutionary Genetics Analysis (MEGA) for macOS

Mon, 06 Jan 2020 00:00:00 GMT

Abstract
The Molecular Evolutionary Genetics Analysis (MEGA) software enables comparative analysis of molecular sequences in phylogenetics and evolutionary medicine. Here, we introduce the macOS version of the MEGA software. This new version eliminates the need for virtualization and emulation programs previously required to use MEGA on Apple computers. MEGA for macOS utilizes memory and computing resources efficiently for conducting evolutionary analyses on macOS. It has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux. MEGA for macOS is available from www.megasoftware.net free of charge.

Genomic Mechanisms of Physiological and Morphological Adaptations of Limestone Langurs to Karst Habitats

Sat, 21 Dec 2019 00:00:00 GMT

Abstract
Knowledge of the physiological and morphological evolution and adaptation of nonhuman primates is critical to understand hominin origins, physiological ecology, morphological evolution, and applications in biomedicine. Particularly, limestone langurs represent a direct example of adaptations to the challenges of exploiting a high calcium and harsh environment. Here, we report a de novo genome assembly (Tfra_2.0) of a male François’s langur (Trachypithecus francoisi) with contig N50 of 16.3 Mb and resequencing data of 23 individuals representing five limestone and four forest langur species. Comparative genomics reveals evidence for functional evolution in genes and gene families related to calcium signaling in the limestone langur genome, probably as an adaptation to naturally occurring high calcium levels present in water and plant resources in karst habitats. The genomic and functional analyses suggest that a single point mutation (Lys1905Arg) in the α1c subunit of the L-type voltage-gated calcium channel Cav1.2 (CACNA1C) attenuates the inward calcium current into the cells in vitro. Population genomic analyses and RNA-sequencing indicate that EDNRB is less expressed in white tail hair follicles of the white-headed langur (T. leucocephalus) compared with the black-colored François’s langur and hence might be responsible for species-specific differences in body coloration. Our findings contribute to a new understanding of gene–environment interactions and physiomorphological adaptative mechanisms in ecologically specialized primate taxa.

Enzyme Evolution: An Epistatic Ratchet versus a Smooth Reversible Transition

Thu, 19 Dec 2019 00:00:00 GMT

Abstract
Evolutionary trajectories are deemed largely irreversible. In a newly diverged protein, reversion of mutations that led to the functional switch typically results in loss of both the new and the ancestral functions. Nonetheless, evolutionary transitions where reversions are viable have also been described. The structural and mechanistic causes of reversion compatibility versus incompatibility therefore remain unclear. We examined two laboratory evolution trajectories of mammalian paraoxonase-1, a lactonase with promiscuous organophosphate hydrolase (OPH) activity. Both trajectories began with the same active-site mutant, His115Trp, which lost the native lactonase activity and acquired higher OPH activity. A neo-functionalization trajectory amplified the promiscuous OPH activity, whereas the re-functionalization trajectory restored the native activity, thus generating a new lactonase that lacks His115. The His115 revertants of these trajectories indicated opposite trends. Revertants of the neo-functionalization trajectory lost both the evolved OPH and the original lactonase activity. Revertants of the trajectory that restored the original lactonase function were, however, fully active. Crystal structures and molecular simulations show that in the newly diverged OPH, the reverted His115 and other catalytic residues are displaced, thus causing loss of both the original and the new activity. In contrast, in the re-functionalization trajectory, reversion compatibility of the original lactonase activity derives from mechanistic versatility whereby multiple residues can fulfill the same task. This versatility enables unique sequence-reversible compositions that are inaccessible when the active site was repurposed toward a new function.

Enhancers Facilitate the Birth of De Novo Genes and Gene Integration into Regulatory Networks

Tue, 17 Dec 2019 00:00:00 GMT

Abstract
Regulatory networks control the spatiotemporal gene expression patterns that give rise to and define the individual cell types of multicellular organisms. In eumetazoa, distal regulatory elements called enhancers play a key role in determining the structure of such networks, particularly the wiring diagram of “who regulates whom.” Mutations that affect enhancer activity can therefore rewire regulatory networks, potentially causing adaptive changes in gene expression. Here, we use whole-tissue and single-cell transcriptomic and chromatin accessibility data from mouse to show that enhancers play an additional role in the evolution of regulatory networks: They facilitate network growth by creating transcriptionally active regions of open chromatin that are conducive to de novo gene evolution. Specifically, our comparative transcriptomic analysis with three other mammalian species shows that young, mouse-specific intergenic open reading frames are preferentially located near enhancers, whereas older open reading frames are not. Mouse-specific intergenic open reading frames that are proximal to enhancers are more highly and stably transcribed than those that are not proximal to enhancers or promoters, and they are transcribed in a limited diversity of cellular contexts. Furthermore, we report several instances of mouse-specific intergenic open reading frames proximal to promoters showing evidence of being repurposed enhancers. We also show that open reading frames gradually acquire interactions with enhancers over macroevolutionary timescales, helping integrate genes—those that have arisen de novo or by other means—into existing regulatory networks. Taken together, our results highlight a dual role of enhancers in expanding and rewiring gene regulatory networks.

Population History and Gene Divergence in Native Mexicans Inferred from 76 Human Exomes

Tue, 17 Dec 2019 00:00:00 GMT

Abstract
Native American genetic variation remains underrepresented in most catalogs of human genome sequencing data. Previous genotyping efforts have revealed that Mexico’s Indigenous population is highly differentiated and substructured, thus potentially harboring higher proportions of private genetic variants of functional and biomedical relevance. Here we have targeted the coding fraction of the genome and characterized its full site frequency spectrum by sequencing 76 exomes from five Indigenous populations across Mexico. Using diffusion approximations, we modeled the demographic history of Indigenous populations from Mexico with northern and southern ethnic groups splitting 7.2 KYA and subsequently diverging locally 6.5 and 5.7 KYA, respectively. Selection scans for positive selection revealed BCL2L13 and KBTBD8 genes as potential candidates for adaptive evolution in Rarámuris and Triquis, respectively. BCL2L13 is highly expressed in skeletal muscle and could be related to physical endurance, a well-known phenotype of the northern Mexico Rarámuri. The KBTBD8 gene has been associated with idiopathic short stature and we found it to be highly differentiated in Triqui, a southern Indigenous group from Oaxaca whose height is extremely low compared to other Native populations.

A Depletion of Stop Codons in lincRNA is Owing to Transfer of Selective Constraint from Coding Sequences

Mon, 16 Dec 2019 00:00:00 GMT

Abstract
Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.

Ancestral Hybridization Facilitated Species Diversification in the Lake Malawi Cichlid Fish Adaptive Radiation

Sat, 14 Dec 2019 00:00:00 GMT

Abstract
The adaptive radiation of cichlid fishes in East African Lake Malawi encompasses over 500 species that are believed to have evolved within the last 800,000 years from a common founder population. It has been proposed that hybridization between ancestral lineages can provide the genetic raw material to fuel such exceptionally high diversification rates, and evidence for this has recently been presented for the Lake Victoria region cichlid superflock. Here, we report that Lake Malawi cichlid genomes also show evidence of hybridization between two lineages that split 3–4 Ma, today represented by Lake Victoria cichlids and the riverine Astatotilapia sp. “ruaha blue.” The two ancestries in Malawi cichlid genomes are present in large blocks of several kilobases, but there is little variation in this pattern between Malawi cichlid species, suggesting that the large-scale mosaic structure of the genomes was largely established prior to the radiation. Nevertheless, tens of thousands of polymorphic variants apparently derived from the hybridization are interspersed in the genomes. These loci show a striking excess of differentiation across ecological subgroups in the Lake Malawi cichlid assemblage, and parental alleles sort differentially into benthic and pelagic Malawi cichlid lineages, consistent with strong differential selection on these loci during species divergence. Furthermore, these loci are enriched for genes involved in immune response and vision, including opsin genes previously identified as important for speciation. Our results reinforce the role of ancestral hybridization in explosive diversification by demonstrating its significance in one of the largest recent vertebrate adaptive radiations.

Paralogization and New Protein Architectures in Planctomycetes Bacteria with Complex Cell Structures

Wed, 11 Dec 2019 00:00:00 GMT

Abstract
Bacteria of the phylum Planctomycetes have a unique cell plan with an elaborate intracellular membrane system, thereby resembling eukaryotic cells. The origin and evolution of these remarkable features is debated. To study the evolutionary genomics of bacteria with complex cell architectures, we have resequenced the 9.2-Mb genome of the model organism Gemmata obscuriglobus and sequenced the 10-Mb genome of G. massiliana Soil9, the 7.9-Mb genome of CJuql4, and the 6.7-Mb genome of Tuwongella immobilis, all of which belong to the family Gemmataceae. A gene flux analysis of the Planctomycetes revealed a massive emergence of novel protein families at multiple nodes within the Gemmataceae. The expanded protein families have unique multidomain architectures composed of domains that are characteristic of prokaryotes, such as the sigma factor domain of extracytoplasmic sigma factors, and domains that have proliferated in eukaryotes, such as the WD40, leucine-rich repeat, tetratricopeptide repeat and Ser/Thr kinase domains. Proteins with identifiable domains in the Gemmataceae have longer lengths and linkers than proteins in most other bacteria, and the analyses suggest that these traits were ancestrally present in the Planctomycetales. A broad comparison of protein length distribution profiles revealed an overlap between the longest proteins in prokaryotes and the shortest proteins in eukaryotes. We conclude that the many similarities between proteins in the Planctomycetales and the eukaryotes are due to convergent evolution and that there is no strict boundary between prokaryotes and eukaryotes with regard to features such as gene paralogy, protein length, and protein domain composition patterns.

Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference

Wed, 11 Dec 2019 00:00:00 GMT

Abstract
Evolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.

Sexual Selection Shapes Seminal Vesicle Secretion Gene Expression in House Mice

Tue, 10 Dec 2019 00:00:00 GMT

Abstract
Reproductive proteins typically have high rates of molecular evolution, and are assumed to be under positive selection from sperm competition and cryptic female choice. However, ascribing evolutionary divergence in the genome to these processes of sexual selection from patterns of association alone is problematic. Here, we use an experimental manipulation of postmating sexual selection acting on populations of house mice and explore its consequences for the expression of seminal vesicle secreted (SVS) proteins. Following 25 generations of selection, males from populations subjected to postmating sexual selection had evolved increased expression of at least two SVS genes that exhibit the signature of positive selection at the molecular level, SVS1 and SVS2. These proteins contribute to mating plug formation and sperm survival in the female reproductive tract. Our data thereby support the view that sexual selection is responsible for the evolution of these seminal fluid proteins.

Distinct Evolutionary Trajectories of Neuronal and Hair Cell Nicotinic Acetylcholine Receptors

Tue, 10 Dec 2019 00:00:00 GMT

Abstract
The expansion and pruning of ion channel families has played a crucial role in the evolution of nervous systems. Nicotinic acetylcholine receptors (nAChRs) are ligand-gated ion channels with distinct roles in synaptic transmission at the neuromuscular junction, the central and peripheral nervous system, and the inner ear. Remarkably, the complement of nAChR subunits has been highly conserved along vertebrate phylogeny. To ask whether the different subtypes of receptors underwent different evolutionary trajectories, we performed a comprehensive analysis of vertebrate nAChRs coding sequences, mouse single-cell expression patterns, and comparative functional properties of receptors from three representative tetrapod species. We found significant differences between hair cell and neuronal receptors that were most likely shaped by the differences in coexpression patterns and coassembly rules of component subunits. Thus, neuronal nAChRs showed high degree of coding sequence conservation, coupled to greater coexpression variance and conservation of functional properties across tetrapod clades. In contrast, hair cell α9α10 nAChRs exhibited greater sequence divergence, narrow coexpression pattern, and great variability of functional properties across species. These results point to differential substrates for random change within the family of gene paralogs that relate to the segregated roles of nAChRs in synaptic transmission.

Convergent Evolution of Cysteine-Rich Keratins in Hard Skin Appendages of Terrestrial Vertebrates

Tue, 10 Dec 2019 00:00:00 GMT

Abstract
Terrestrial vertebrates have evolved hard skin appendages, such as scales, claws, feathers, and hair that play crucial roles in defense, predation, locomotion, and thermal insulation. The mechanical properties of these skin appendages are largely determined by cornified epithelial components. So-called “hair keratins,” cysteine-rich intermediate filament proteins that undergo covalent cross-linking via disulfide bonds, are the crucial structural proteins of hair and claws in mammals and hair keratin orthologs are also present in lizard claws, indicating an evolutionary origin in a hairless common ancestor of amniotes. Here, we show that reptiles and birds have also other cysteine-rich keratins which lack cysteine-rich orthologs in mammals. In addition to hard acidic (type I) sauropsid-specific (HAS) keratins, we identified hard basic (type II) sauropsid-specific (HBS) keratins which are conserved in lepidosaurs, turtles, crocodilians, and birds. Immunohistochemical analysis with a newly made antibody revealed expression of chicken HBS1 keratin in the cornifying epithelial cells of feathers. Molecular phylogenetics suggested that the high cysteine contents of HAS and HBS keratins evolved independently from the cysteine-rich sequences of hair keratin orthologs, thus representing products of convergent evolution. In conclusion, we propose an evolutionary model in which HAS and HBS keratins evolved as structural proteins in epithelial cornification of reptiles and at least one HBS keratin was co-opted as a component of feathers after the evolutionary divergence of birds from reptiles. Thus, cytoskeletal proteins of hair and feathers are products of convergent evolution and evolutionary co-option to similar biomechanical functions in clade-specific hard skin appendages.

Human Genomic Diversity Where the Mediterranean Joins the Atlantic

Mon, 09 Dec 2019 00:00:00 GMT

Abstract
Throughout the past few years, a lively debate emerged about the timing and magnitude of the human migrations between the Iberian Peninsula and the Maghreb. Several pieces of evidence, including archaeological, anthropological, historical, and genetic data, have pointed to a complex and intermingled evolutionary history in the western Mediterranean area. To study to what extent connections across the Strait of Gibraltar and surrounding areas have shaped the present-day genomic diversity of its populations, we have performed a screening of 2.5 million single-nucleotide polymorphisms in 142 samples from southern Spain, southern Portugal, and Morocco. We built comprehensive data sets of the studied area and we implemented multistep bioinformatic approaches to assess population structure, demographic histories, and admixture dynamics. Both local and global ancestry inference showed an internal substructure in the Iberian Peninsula, mainly linked to a differential African ancestry. Western Iberia, from southern Portugal to Galicia, constituted an independent cluster within Iberia characterized by an enriched African genomic input. Migration time modeling showed recent historic dates for the admixture events occurring both in Iberia and in the North of Africa. However, an integrative vision of both paleogenomic and modern DNA data allowed us to detect chronological transitions and population turnovers that could be the result of transcontinental migrations dating back from Neolithic times. The present contribution aimed to fill the gaps in the modern human genomic record of a key geographic area, where the Mediterranean and the Atlantic come together.

The Laboratory Domestication of Zebrafish: From Diverse Populations to Inbred Substrains

Fri, 06 Dec 2019 00:00:00 GMT

Abstract
We know from human genetic studies that practically all aspects of biology are strongly influenced by the genetic background, as reflected in the advent of “personalized medicine.” Yet, with few exceptions, this is not taken into account when using laboratory populations as animal model systems for research in these fields. Laboratory strains of zebrafish (Danio rerio) are widely used for research in vertebrate developmental biology, behavior, and physiology, for modeling diseases, and for testing pharmaceutic compounds in vivo. However, all of these strains are derived from artificial bottleneck events and therefore are likely to represent only a fraction of the genetic diversity present within the species. Here, we use restriction site-associated DNA sequencing to genetically characterize wild populations of zebrafish from India, Nepal, and Bangladesh, and to compare them to previously published data on four common laboratory strains. We measured nucleotide diversity, heterozygosity, and allele frequency spectra, and find that wild zebrafish are much more diverse than laboratory strains. Further, in wild zebrafish, there is a clear signal of GC-biased gene conversion that is missing in laboratory strains. We also find that zebrafish populations in Nepal and Bangladesh are most distinct from all other strains studied, making them an attractive subject for future studies of zebrafish population genetics and molecular ecology. Finally, isolates of the same strains kept in different laboratories show a pattern of ongoing differentiation into genetically distinct substrains. Together, our findings broaden the basis for future genetic, physiological, pharmaceutic, and evolutionary studies in Danio rerio.

Evolution of a Novel and Adaptive Floral Scent in Wild Tobacco

Fri, 06 Dec 2019 00:00:00 GMT

Abstract
Many plants emit diverse floral scents that mediate plant–environment interactions and attain reproductive success. However, how plants evolve novel and adaptive biosynthetic pathways for floral volatiles remains unclear. Here, we show that in the wild tobacco, Nicotiana attenuata, a dominant species-specific floral volatile (benzyl acetone, BA) that attracts pollinators and deters florivore is synthesized by phenylalanine ammonia-lyase 4 (NaPAL4), isoflavone reductase 3 (NaIFR3), and chalcone synthase 3 (NaCHAL3). Transient expression of NaFIR3 alone in N. attenuata leaves is sufficient and necessary for ectopic foliar BA emissions, and coexpressing NaIFR3 with NaPAL4 and NaCHAL3 increased the BA emission levels. Independent changes in transcription of NaPAL4 and NaCHAL3 contributed to intraspecific variations of floral BA emission. However, among species, the gain of expression of NaIFR3 resulted in the biosynthesis of BA, which was only found in N. attenuata. This study suggests that novel metabolic pathways associated with adaptation can arise via reconfigurations of gene expression.

A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis

Fri, 06 Dec 2019 00:00:00 GMT

Abstract
Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.

Mutations Beget More Mutations—Rapid Evolution of Mutation Rate in Response to the Risk of Runaway Accumulation

Tue, 03 Dec 2019 00:00:00 GMT

Abstract
The rapidity with which the mutation rate evolves could greatly impact evolutionary patterns. Nevertheless, most studies simply assume a constant rate in the time scale of interest (Kimura 1983; Drake 1991; Kumar 2005; Li 2007; Lynch 2010). In contrast, recent studies of somatic mutations suggest that the mutation rate may vary by several orders of magnitude within a lifetime (Kandoth et al. 2013; Lawrence et al. 2013). To resolve the discrepancy, we now propose a runaway model, applicable to both the germline and soma, whereby mutator mutations form a positive-feedback loop. In this loop, any mutator mutation would increase the rate of acquiring the next mutator, thus triggering a runaway escalation in mutation rate. The process can be initiated more readily if there are many weak mutators than a few strong ones. Interestingly, even a small increase in the mutation rate at birth could trigger the runaway process, resulting in unfit progeny. In slowly reproducing species, the need to minimize the risk of this uncontrolled accumulation would thus favor setting the mutation rate low. In comparison, species that starts and ends reproduction sooner do not face the risk and may set the baseline mutation rate higher. The mutation rate would evolve in response to the risk of runaway mutation, in particular, when the generation time changes. A rapidly evolving mutation rate may shed new lights on many evolutionary phenomena (Elango et al. 2006; Thomas et al. 2010, 2018; Langergraber et al. 2012; Besenbacher et al. 2019).

Genetic Landscapes Reveal How Human Genetic Diversity Aligns with Geography

Thu, 28 Nov 2019 00:00:00 GMT

Abstract
Geographic patterns in human genetic diversity carry footprints of population history and provide insights for genetic medicine and its application across human populations. Summarizing and visually representing these patterns of diversity has been a persistent goal for human geneticists, and has revealed that genetic differentiation is frequently correlated with geographic distance. However, most analytical methods to represent population structure do not incorporate geography directly, and it must be considered post hoc alongside a visual summary of the genetic structure. Here, we estimate “effective migration” surfaces to visualize how human genetic diversity is geographically structured. The results reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, the relationship is often subtly distorted. Overall, the visualizations provide a new perspective on genetics and geography in humans and insight to the geographic distribution of human genetic variation.

Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions

Thu, 21 Nov 2019 00:00:00 GMT

Abstract
Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.

Gene Duplication Accelerates the Pace of Protein Gain and Loss from Plant Organelles

Thu, 21 Nov 2019 00:00:00 GMT

Abstract
Organelle biogenesis and function is dependent on the concerted action of both organellar-encoded (if present) and nuclear-encoded proteins. Differences between homologous organelles across the Plant Kingdom arise, in part, as a result of differences in the cohort of nuclear-encoded proteins that are targeted to them. However, neither the rate at which differences in protein targeting accumulate nor the evolutionary consequences of these changes are known. Using phylogenomic approaches coupled to ancestral state estimation, we show that the plant organellar proteome has diversified in proportion with molecular sequence evolution such that the proteomes of plant chloroplasts and mitochondria lose or gain on average 3.6 proteins per million years. We further demonstrate that changes in organellar protein targeting are associated with an increase in the rate of molecular sequence evolution and that such changes predominantly occur in genes with regulatory rather than metabolic functions. Finally, we show that gain and loss of protein target signals occurs at a higher rate following gene duplication, revealing that gene and genome duplication are a key facilitator of plant organelle evolution.

Genotyping and De Novo Discovery of Allelic Variants at the Brassicaceae Self-Incompatibility Locus from Short-Read Sequencing Data

Tue, 05 Nov 2019 00:00:00 GMT

Abstract
Plant self-incompatibility (SI) is a genetic system that prevents selfing and enforces outcrossing. Because of strong balancing selection, the genes encoding SI are predicted to maintain extraordinarily high levels of polymorphism, both in terms of the number of functionally distinct S-alleles that segregate in SI species and in terms of their nucleotide sequence divergence. However, because of these two combined features, documenting polymorphism of these genes also presents important methodological challenges that have so far largely prevented the comprehensive analysis of complete allelic series in natural populations, and also precluded the obtention of complete genic sequences for many S-alleles. Here, we develop a powerful methodological approach based on a computationally optimized comparison of short Illumina sequencing reads from genomic DNA to a database of known nucleotide sequences of the extracellular domain of SRK (eSRK). By examining mapping patterns along the reference sequences, we obtain highly reliable predictions of S-genotypes from individuals collected from natural populations of Arabidopsis halleri. Furthermore, using a de novo assembly approach of the filtered short reads, we obtain full-length sequences of eSRK even when the initial sequence in the database was only partial, and we discover putative new SRK alleles that were not initially present in the database. When including those new alleles in the reference database, we were able to resolve the complete diploid SI genotypes of all individuals. Beyond the specific case of Brassicaceae S-alleles, our approach can be readily applied to other polymorphic loci, given reference allelic sequences are available.

Protein Structural Information and Evolutionary Landscape by In Vitro Evolution

Thu, 31 Oct 2019 00:00:00 GMT

Abstract
Protein structure is tightly intertwined with function according to the laws of evolution. Understanding how structure determines function has been the aim of structural biology for decades. Here, we have wondered instead whether it is possible to exploit the function for which a protein was evolutionary selected to gain information on protein structure and on the landscape explored during the early stages of molecular and natural evolution. To answer to this question, we developed a new methodology, which we named CAMELS (Coupling Analysis by Molecular Evolution Library Sequencing), that is able to obtain the in vitro evolution of a protein from an artificial selection based on function. We were able to observe with CAMELS many features of the TEM-1 beta-lactamase local fold exclusively by generating and sequencing large libraries of mutational variants. We demonstrated that we can, whenever a functional phenotypic selection of a protein is available, sketch the structural and evolutionary landscape of a protein without utilizing purified proteins, collecting physical measurements, or relying on the pool of natural protein variants.

GBE | Most Read

Genome Biology & Evolution

Highlight—Untangling the Genetic Basis of Sociality in Spiders

Tue, 31 Mar 2020 00:00:00 GMT

The idea of a complex spider society—in which thousands of spiders live, hunt, and raise their young together in a single colony—is unsettling to many of us. We are perhaps lucky then that this scene is relatively rare among arachnids. Among the 40,000 known species of spiders, the vast majority live solitary lives and will often show aggression toward other spiders they encounter, even within their own species. There are <25 known species of social spiders, distributed broadly across six different families and nine different genera. Not only do these spiders live in social groups, but also they produce populations that grow over time as new offspring are added to the nest, enabling the capture of increasingly large prey as the colony expands, and even give rise to new daughter colonies. As social creatures ourselves, humans have long been interested in the evolutionary innovations that enable social cooperation. In a new article in Genome Biology and Evolution titled “Comparative genomics identifies putative signatures of sociality in spiders,” researchers provide one of the first glimpses into the genetic underpinnings for how a solitary spider evolves into a social one.

Eight Million Years of Satellite DNA Evolution in Grasshoppers of the Genus Schistocerca Illuminate the Ins and Outs of the Library Hypothesis

Tue, 17 Mar 2020 00:00:00 GMT

Abstract
Satellite DNA (satDNA) is an abundant class of tandemly repeated noncoding sequences, showing high rate of change in sequence, abundance, and physical location. However, the mechanisms promoting these changes are still controversial. The library model was put forward to explain the conservation of some satDNAs for long periods, predicting that related species share a common collection of satDNAs, which mostly experience quantitative changes. Here, we tested the library model by analyzing three satDNAs in ten species of Schistocerca grasshoppers. This group represents a valuable material because it diversified during the last 7.9 Myr across the American continent from the African desert locust (Schistocerca gregaria), and this thus illuminates the direction of evolutionary changes. By combining bioinformatic and cytogenetic, we tested whether these three satDNA families found in S. gregaria are also present in nine American species, and whether differential gains and/or losses have occurred in the lineages. We found that the three satDNAs are present in all species but display remarkable interspecies differences in their abundance and sequences while being highly consistent with genus phylogeny. The number of chromosomal loci where satDNA is present was also consistent with phylogeny for two satDNA families but not for the other. Our results suggest eminently chance events for satDNA evolution. Several evolutionary trends clearly imply either massive amplifications or contractions, thus closely fitting the library model prediction that changes are mostly quantitative. Finally, we found that satDNA amplifications or contractions may influence the evolution of monomer consensus sequences and by chance playing a major role in driftlike dynamics.

Traveler, a New DD35E Family of Tc1/Mariner Transposons, Invaded Vertebrates Very Recently

Tue, 18 Feb 2020 00:00:00 GMT

Abstract
The discovery of new members of the Tc1/mariner superfamily of transposons is expected based on the increasing availability of genome sequencing data. Here, we identified a new DD35E family termed Traveler (TR). Phylogenetic analyses of its DDE domain and full-length transposase showed that, although TR formed a monophyletic clade, it exhibited the highest sequence identity and closest phylogenetic relationship with DD34E/Tc1. This family displayed a very restricted taxonomic distribution in the animal kingdom and was only detected in ray-finned fish, anura, and squamata, including 91 vertebrate species. The structural organization of TRs was highly conserved across different classes of animals. Most intact TR transposons had a length of ∼1.5 kb (range 1,072–2,191 bp) and harbored a single open reading frame encoding a transposase of ∼340 aa (range 304–350 aa) flanked by two short-terminal inverted repeats (13–68 bp). Several conserved motifs, including two helix-turn-helix motifs, a GRPR motif, a nuclear localization sequence, and a DDE domain, were also identified in TR transposases. This study also demonstrated the presence of horizontal transfer events of TRs in vertebrates, whereas the average sequence identities and the evolutionary dynamics of TR elements across species and clusters strongly indicated that the TR family invaded the vertebrate lineage very recently and that some of these elements may be currently active, combining the intact TR copies in multiple lineages of vertebrates. These data will contribute to the understanding of the evolutionary history of Tc1/mariner transposons and that of their hosts.

Genome-Wide Selection Scan in an Arabian Peninsula Population Identifies a TNKS Haplotype Linked to Metabolic Traits and Hypertension

Tue, 18 Feb 2020 00:00:00 GMT

Abstract
Despite the extreme and varying environmental conditions prevalent in the Arabian Peninsula, it has experienced several waves of human migrations following the out-of-Africa diaspora. Eventually, the inhabitants of the peninsula region adapted to the hot and dry environment. The adaptation and natural selection that shaped the extant human populations of the Arabian Peninsula region have been scarcely studied. In an attempt to explore natural selection in the region, we analyzed 662,750 variants in 583 Kuwaiti individuals. We searched for regions in the genome that display signatures of positive selection in the Kuwaiti population using an integrative approach in a conservative manner. We highlight a haplotype overlapping TNKS that showed strong signals of positive selection based on the results of the multiple selection tests conducted (integrated Haplotype Score, Cross Population Extended Haplotype Homozygosity, Population Branch Statistics, and log-likelihood ratio scores). Notably, the TNKS haplotype under selection potentially conferred a fitness advantage to the Kuwaiti ancestors for surviving in the harsh environment while posing a major health risk to present-day Kuwaitis.

Independent Transposon Exaptation Is a Widespread Mechanism of Redundant Enhancer Evolution in the Mammalian Genome

Mon, 17 Feb 2020 00:00:00 GMT

Abstract
Many regulatory networks appear to involve partially redundant enhancers. Traditionally, such enhancers have been hypothesized to originate mainly by sequence duplication. An alternative model postulates that they arise independently, through convergent evolution. This mechanism appears to be counterintuitive to natural selection: Redundant sequences are expected to either diverge and acquire new functions or accumulate mutations and become nonfunctional. Nevertheless, we show that at least 31% of the redundant enhancer pairs in the human genome (and 17% in the mouse genome) indeed originated in this manner. Specifically, for virtually all transposon-derived redundant enhancer pairs, both enhancer partners have evolved independently, from the exaptation of two different transposons. In addition to conferring robustness to the system, redundant enhancers could provide an evolutionary advantage by fine-tuning gene expression. Consistent with this hypothesis, we observed that the target genes of redundant enhancers exhibit higher expression levels and tissue specificity as compared with other genes. Finally, we found that although enhancer redundancy appears to be an intrinsic property of certain mammalian regulatory networks, the corresponding enhancers are largely species-specific. In other words, the redundancy in these networks is most likely a result of convergent evolution.

Functional Architecture of Deleterious Genetic Variants in the Genome of a Wrangel Island Mammoth

Fri, 07 Feb 2020 00:00:00 GMT

Abstract
Woolly mammoths were among the most abundant cold-adapted species during the Pleistocene. Their once-large populations went extinct in two waves, an end-Pleistocene extinction of continental populations followed by the mid-Holocene extinction of relict populations on St. Paul Island ∼5,600 years ago and Wrangel Island ∼4,000 years ago. Wrangel Island mammoths experienced an episode of rapid demographic decline coincident with their isolation, leading to a small population, reduced genetic diversity, and the fixation of putatively deleterious alleles, but the functional consequences of these processes are unclear. Here, we show that a Wrangel Island mammoth genome had many putative deleterious mutations that are predicted to cause diverse behavioral and developmental defects. Resurrection and functional characterization of several genes from the Wrangel Island mammoth carrying putatively deleterious substitutions identified both loss and gain of function mutations in genes associated with developmental defects (HYLS1), oligozoospermia and reduced male fertility (NKD1), diabetes (NEUROG3), and the ability to detect floral scents (OR5A1). These data suggest that at least one Wrangel Island mammoth may have suffered adverse consequences from reduced population size and isolation.

Bacterial Origin and Reductive Evolution of the CPR Group

Fri, 07 Feb 2020 00:00:00 GMT

Abstract
The candidate phyla radiation (CPR) is a proposed subdivision within the bacterial domain comprising several candidate phyla. CPR organisms are united by small genome and physical sizes, lack several metabolic enzymes, and populate deep branches within the bacterial subtree of life. These features raise intriguing questions regarding their origin and mode of evolution. In this study, we performed a comparative and phylogenomic analysis to investigate CPR origin and evolution. Unlike previous gene/protein sequence-based reports of CPR evolution, we used protein domain superfamilies classified by protein structure databases to resolve the evolutionary relationships of CPR with non-CPR bacteria, Archaea, Eukarya, and viruses. Across all supergroups, CPR shared maximum superfamilies with non-CPR bacteria and were placed as deep branching bacteria in most phylogenomic trees. CPR contributed 1.22% of new superfamilies to bacteria including the ribosomal protein L19e and encoded four core superfamilies that are likely involved in cell-to-cell interaction and establishing episymbiotic lifestyles. Although CPR and non-CPR bacterial proteomes gained common superfamilies over the course of evolution, CPR and Archaea had more common losses. These losses mostly involved metabolic superfamilies. In fact, phylogenies built from only metabolic protein superfamilies separated CPR and non-CPR bacteria. These findings indicate that CPR are bacterial organisms that have probably evolved in an Archaea-like manner via the early loss of metabolic functions. We also discovered that phylogenies built from metabolic and informational superfamilies gave contrasting views of the groupings among Archaea, Bacteria, and Eukarya, which add to the current debate on the evolutionary relationships among superkingdoms.

Comparative Genomics Identifies Putative Signatures of Sociality in Spiders

Tue, 21 Jan 2020 00:00:00 GMT

Abstract
Comparative genomics has begun to elucidate the genomic basis of social life in insects, but insight into the genomic basis of spider sociality has lagged behind. To begin, to characterize genomic signatures associated with the evolution of social life in spiders, we performed one of the first spider comparative genomics studies including five solitary species and two social species, representing two independent origins of sociality in the genus Stegodyphus. We found that the two social spider species had a large expansion of gene families associated with transport and metabolic processes and an elevated genome-wide rate of molecular evolution compared with the five solitary spider species. Genes that were rapidly evolving in the two social species relative to the five solitary species were enriched for transport, behavior, and immune functions, whereas genes that were rapidly evolving in the solitary species were enriched for energy metabolism processes. Most rapidly evolving genes in the social species Stegodyphus dumicola were broadly expressed across four tissues and enriched for transport functions, but 12 rapidly evolving genes showed brain-specific expression and were enriched for social behavioral processes. Altogether, our study identifies putative genomic signatures and potential candidate genes associated with spider sociality. These results indicate that future spider comparative genomic studies, including broader sampling and additional independent origins of sociality, can further clarify the genomic causes and consequences of social life.