TERMS OF USE

AGREEMENT BETWEEN USER AND Society for Molecular Biology and Evolution SECURE MEMBER SERVICES

The Society for Molecular Biology and Evolution Secure Member Services Web Site is comprised of various Web pages operated by Society for Molecular Biology and Evolution Secure Member Services.

The Society for Molecular Biology and Evolution Secure Member Services Web Site is offered to you conditioned on your acceptance without modification of the terms, conditions, and notices contained herein. Your use of the Society for Molecular Biology and Evolution Secure Member Services Web Site constitutes your agreement to all such terms, conditions, and notices.

MODIFICATION OF THESE TERMS OF USE

Society for Molecular Biology and Evolution Secure Member Services reserves the right to change the terms, conditions, and notices under which the Society for Molecular Biology and Evolution Secure Member Services Web Site is offered, including but not limited to the charges associated with the use of the Society for Molecular Biology and Evolution Secure Member Services Web Site.

The Society for Molecular Biology and Evolution Secure Member Services Web Site may contain links to other Web Sites ("Linked Sites"). The Linked Sites are not under the control of Society for Molecular Biology and Evolution Secure Member Services and Society for Molecular Biology and Evolution Secure Member Services is not responsible for the contents of any Linked Site, including without limitation any link contained in a Linked Site, or any changes or updates to a Linked Site. Society for Molecular Biology and Evolution Secure Member Services is not responsible for webcasting or any other form of transmission received from any Linked Site. Society for Molecular Biology and Evolution Secure Member Services is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement by Society for Molecular Biology and Evolution Secure Member Services of the site or any association with its operators.

NO UNLAWFUL OR PROHIBITED USE

As a condition of your use of the Society for Molecular Biology and Evolution Secure Member Services Web Site, you warrant to Society for Molecular Biology and Evolution Secure Member Services that you will not use the Society for Molecular Biology and Evolution Secure Member Services Web Site for any purpose that is unlawful or prohibited by these terms, conditions, and notices. You may not use the Society for Molecular Biology and Evolution Secure Member Services Web Site in any manner which could damage, disable, overburden, or impair the Society for Molecular Biology and Evolution Secure Member Services Web Site or interfere with any other party's use and enjoyment of the Society for Molecular Biology and Evolution Secure Member Services Web Site. You may not obtain or attempt to obtain any materials or information through any means not intentionally made available or provided for through the Society for Molecular Biology and Evolution Secure Member Services Web Sites.

USE OF COMMUNICATION SERVICES

The Society for Molecular Biology and Evolution Secure Member Services Web Site may contain bulletin board services, chat areas, news groups, forums, communities, personal web pages, calendars, and/or other message or communication facilities designed to enable you to communicate with the public at large or with a group (collectively, "Communication Services"), you agree to use the Communication Services only to post, send and receive messages and material that are proper and related to the particular Communication Service. By way of example, and not as a limitation, you agree that when using a Communication Service, you will not:

  • Defame, abuse, harass, stalk, threaten or otherwise violate the legal rights (such as rights of privacy and publicity) of others.
  • Publish, post, upload, distribute or disseminate any inappropriate, profane, defamatory, infringing, obscene, indecent or unlawful topic, name, material or information.
  • Upload files that contain software or other material protected by intellectual property laws (or by rights of privacy of publicity) unless you own or control the rights thereto or have received all necessary consents.
  • Upload files that contain viruses, corrupted files, or any other similar software or programs that may damage the operation of another's computer.
  • Advertise or offer to sell or buy any goods or services for any business purpose, unless such Communication Service specifically allows such messages.
  • Conduct or forward surveys, contests, pyramid schemes or chain letters.
  • Download any file posted by another user of a Communication Service that you know, or reasonably should know, cannot be legally distributed in such manner.
  • Falsify or delete any author attributions, legal or other proper notices or proprietary designations or labels of the origin or source of software or other material contained in a file that is uploaded.
  • Restrict or inhibit any other user from using and enjoying the Communication Services.
  • Violate any code of conduct or other guidelines which may be applicable for any particular Communication Service.
  • Harvest or otherwise collect information about others, including e-mail addresses, without their consent.
  • Violate any applicable laws or regulations.

Society for Molecular Biology and Evolution Secure Member Services has no obligation to monitor the Communication Services. However, Society for Molecular Biology and Evolution Secure Member Services reserves the right to review materials posted to a Communication Service and to remove any materials in its sole discretion. Society for Molecular Biology and Evolution Secure Member Services reserves the right to terminate your access to any or all of the Communication Services at any time without notice for any reason whatsoever.

Society for Molecular Biology and Evolution Secure Member Services reserves the right at all times to disclose any information as necessary to satisfy any applicable law, regulation, legal process or governmental request, or to edit, refuse to post or to remove any information or materials, in whole or in part, in Society for Molecular Biology and Evolution Secure Member Services's sole discretion.

Always use caution when giving out any personally identifying information about yourself or your children in any Communication Service. Society for Molecular Biology and Evolution Secure Member Services does not control or endorse the content, messages or information found in any Communication Service and, therefore, Society for Molecular Biology and Evolution Secure Member Services specifically disclaims any liability with regard to the Communication Services and any actions resulting from your participation in any Communication Service. Managers and hosts are not authorized Society for Molecular Biology and Evolution Secure Member Services spokespersons, and their views do not necessarily reflect those of Society for Molecular Biology and Evolution Secure Member Services.

Materials uploaded to a Communication Service may be subject to posted limitations on usage, reproduction and/or dissemination. You are responsible for adhering to such limitations if you download the materials.

MATERIALS PROVIDED TO Society for Molecular Biology and Evolution Secure Member Services OR POSTED AT ANY Society for Molecular Biology and Evolution Secure Member Services WEB SITE

Society for Molecular Biology and Evolution Secure Member Services does not claim ownership of the materials you provide to Society for Molecular Biology and Evolution Secure Member Services (including feedback and suggestions) or post, upload, input or submit to any Society for Molecular Biology and Evolution Secure Member Services Web Site or its associated services (collectively "Submissions"). However, by posting, uploading, inputting, providing or submitting your Submission you are granting Society for Molecular Biology and Evolution Secure Member Services, its affiliated companies and necessary sublicensees permission to use your Submission in connection with the operation of their Internet businesses including, without limitation, the rights to: copy, distribute, transmit, publicly display, publicly perform, reproduce, edit, translate and reformat your Submission; and to publish your name in connection with your Submission.

compensation will be paid with respect to the use of your Submission, as provided herein. Society for Molecular Biology and Evolution Secure Member Services is under no obligation to post or use any Submission you may provide and may remove any Submission at any time in Society for Molecular Biology and Evolution Secure Member Services's sole discretion.

By posting, uploading, inputting, providing or submitting your Submission you warrant and represent that you own or otherwise control all of the rights to your Submission as described in this section including, without limitation, all the rights necessary for you to provide, post, upload, input or submit the Submissions.

LIABILITY DISCLAIMER

THE INFORMATION, SOFTWARE, PRODUCTS, AND SERVICES INCLUDED IN OR AVAILABLE THROUGH THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE MAY INCLUDE INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN. Society for Molecular Biology and Evolution Secure Member Services AND/OR ITS SUPPLIERS MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE AT ANY TIME. ADVICE RECEIVED VIA THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE SHOULD NOT BE RELIED UPON FOR PERSONAL, MEDICAL, LEGAL OR FINANCIAL DECISIONS AND YOU SHOULD CONSULT AN APPROPRIATE PROFESSIONAL FOR SPECIFIC ADVICE TAILORED TO YOUR SITUATION.

Society for Molecular Biology and Evolution Secure Member Services AND/OR ITS SUPPLIERS MAKE NO REPRESENTATIONS ABOUT THE SUITABILITY, RELIABILITY, AVAILABILITY, TIMELINESS, AND ACCURACY OF THE INFORMATION, SOFTWARE, PRODUCTS, SERVICES AND RELATED GRAPHICS CONTAINED ON THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE FOR ANY PURPOSE. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, ALL SUCH INFORMATION, SOFTWARE, PRODUCTS, SERVICES AND RELATED GRAPHICS ARE PROVIDED "AS IS" WITHOUT WARRANTY OR CONDITION OF ANY KIND. Society for Molecular Biology and Evolution Secure Member Services AND/OR ITS SUPPLIERS HEREBY DISCLAIM ALL WARRANTIES AND CONDITIONS WITH REGARD TO THIS INFORMATION, SOFTWARE, PRODUCTS, SERVICES AND RELATED GRAPHICS, INCLUDING ALL IMPLIED WARRANTIES OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT.

TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL Society for Molecular Biology and Evolution Secure Member Services AND/OR ITS SUPPLIERS BE LIABLE FOR ANY DIRECT, INDIRECT, PUNITIVE, INCIDENTAL, SPECIAL, CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF USE, DATA OR PROFITS, ARISING OUT OF OR IN ANY WAY CONNECTED WITH THE USE OR PERFORMANCE OF THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE, WITH THE DELAY OR INABILITY TO USE THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE OR RELATED SERVICES, THE PROVISION OF OR FAILURE TO PROVIDE SERVICES, OR FOR ANY INFORMATION, SOFTWARE, PRODUCTS, SERVICES AND RELATED GRAPHICS OBTAINED THROUGH THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE, OR OTHERWISE ARISING OUT OF THE USE OF THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE, WHETHER BASED ON CONTRACT, TORT, NEGLIGENCE, STRICT LIABILITY OR OTHERWISE, EVEN IF Society for Molecular Biology and Evolution Secure Member Services OR ANY OF ITS SUPPLIERS HAS BEEN ADVISED OF THE POSSIBILITY OF DAMAGES. BECAUSE SOME STATES/JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU. IF YOU ARE DISSATISFIED WITH ANY PORTION OF THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE, OR WITH ANY OF THESE TERMS OF USE, YOUR SOLE AND EXCLUSIVE REMEDY IS TO DISCONTINUE USING THE Society for Molecular Biology and Evolution Secure Member Services WEB SITE.

SERVICE CONTACT : Society for Molecular Biology and Evolution@allenpress.com

TERMINATION/ACCESS RESTRICTION

Society for Molecular Biology and Evolution Secure Member Services reserves the right, in its sole discretion, to terminate your access to the Society for Molecular Biology and Evolution Secure Member Services Web Site and the related services or any portion thereof at any time, without notice. GENERAL To the maximum extent permitted by law, this agreement is governed by the laws of the State of Washington, U.S.A. and you hereby consent to the exclusive jurisdiction and venue of courts in King County, Washington, U.S.A. in all disputes arising out of or relating to the use of the Society for Molecular Biology and Evolution Secure Member Services Web Site. Use of the Society for Molecular Biology and Evolution Secure Member Services Web Site is unauthorized in any jurisdiction that does not give effect to all provisions of these terms and conditions, including without limitation this paragraph. You agree that no joint venture, partnership, employment, or agency relationship exists between you and Society for Molecular Biology and Evolution Secure Member Services as a result of this agreement or use of the Society for Molecular Biology and Evolution Secure Member Services Web Site. Society for Molecular Biology and Evolution Secure Member Services's performance of this agreement is subject to existing laws and legal process, and nothing contained in this agreement is in derogation of Society for Molecular Biology and Evolution Secure Member Services's right to comply with governmental, court and law enforcement requests or requirements relating to your use of the Society for Molecular Biology and Evolution Secure Member Services Web Site or information provided to or gathered by Society for Molecular Biology and Evolution Secure Member Services with respect to such use. If any part of this agreement is determined to be invalid or unenforceable pursuant to applicable law including, but not limited to, the warranty disclaimers and liability limitations set forth above, then the invalid or unenforceable provision will be deemed superseded by a valid, enforceable provision that most closely matches the intent of the original provision and the remainder of the agreement shall continue in effect. Unless otherwise specified herein, this agreement constitutes the entire agreement between the user and Society for Molecular Biology and Evolution Secure Member Services with respect to the Society for Molecular Biology and Evolution Secure Member Services Web Site and it supersedes all prior or contemporaneous communications and proposals, whether electronic, oral or written, between the user and Society for Molecular Biology and Evolution Secure Member Services with respect to the Society for Molecular Biology and Evolution Secure Member Services Web Site. A printed version of this agreement and of any notice given in electronic form shall be admissible in judicial or administrative proceedings based upon or relating to this agreement to the same extent an d subject to the same conditions as other business documents and records originally generated and maintained in printed form. It is the express wish to the parties that this agreement and all related documents be drawn up in English.

All contents of the Society for Molecular Biology and Evolution Secure Member Services Web Site are: eBusiness assistance provided by Allen Press, Inc. and/or its suppliers. All rights reserved.

TRADEMARKS

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person, or event is intended or should be inferred.

Any rights not expressly granted herein are reserved.

NOTICES AND PROCEDURE FOR MAKING CLAIMS OF COPYRIGHT INFRINGEMENT

Pursuant to Title 17, United States Code, Section 512(c)(2), notifications of claimed copyright infringement under United States copyright law should be sent to Service Provider's Designated Agent. ALL INQUIRIES NOT RELEVANT TO THE FOLLOWING PROCEDURE WILL RECEIVE NO RESPONSE. See Notice and Procedure for Making Claims of Copyright Infringement.

@OfficialSMBE Feed

MBE | Most Read

Molecular Biology and Evolution

Flexible Mixture Model Approaches That Accommodate Footprint Size Variability for Robust Detection of Balancing Selection

Sun, 04 Oct 2020 00:00:00 GMT

Abstract
Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.

How a Scorpion Toxin Selectively Captures a Prey Sodium Channel: The Molecular and Evolutionary Basis Uncovered

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
The growing resistance of insects to chemical pesticides is reducing the effectiveness of conventional methods for pest control and thus, the development of novel insecticidal agents is imperative. Scorpion toxins specific for insect voltage-gated sodium channels (Navs) have been considered as one of the most promising insecticide alternatives due to their host specificity, rapidly evoked toxicity, biodegradability, and the lack of resistance. However, they have not been developed for uses in agriculture and public health, mainly because of a limited understanding of their molecular and evolutionary basis controlling their phylogenetic selectivity. Here, we show that the traditionally defined insect-selective scorpion toxin LqhIT2 specifically captures a prey Nav through a conserved trapping apparatus comprising a three-residue-formed cavity and a structurally adjacent leucine. The former serves as a detector to recognize and bind a highly exposed channel residue conserved in insects and spiders, two major prey items for scorpions; and the latter subsequently seizes the “moving” voltage sensor via hydrophobic interactions to reduce activation energy for channel opening, demonstrating its action in an enzyme-like manner. Based on the established toxin-channel interaction model in combination with toxicity assay, we enlarged the toxic spectrum of LqhIT2 to spiders and certain other arthropods. Furthermore, we found that genetic background-dependent cavity shapes determine the species selectivity of LqhIT2-related toxins. We expect that the discovery of the trapping apparatus will improve our understanding of the evolution and design principle of Nav-targeted toxins from a diversity of arthropod predators and accelerate their uses in pest control.

Consequences of Stability-Induced Epistasis for Substitution Rates

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
Do interactions between residues in a protein (i.e., epistasis) significantly alter evolutionary dynamics? If so, what consequences might they have on inference from traditional codon substitution models which assume site-independence for the sake of computational tractability? To investigate the effects of epistasis on substitution rates, we employed a mechanistic mutation-selection model in conjunction with a fitness framework derived from protein stability. We refer to this as the stability-informed site-dependent (S-SD) model and developed a new stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein. Comparison of S-SI and S-SD offers a novel and direct method for investigating the consequences of stability-induced epistasis on protein evolution. We developed S-SI and S-SD models for three natural proteins and showed that they generate sequences consistent with real alignments. Our analyses revealed that epistasis tends to increase substitution rates compared with the rates under site-independent evolution. We then assessed the epistatic sensitivity of individual site and discovered a counterintuitive effect: Highly connected sites were less influenced by epistasis relative to exposed sites. Lastly, we show that, despite the unrealistic assumptions, traditional models perform comparably well in the presence and absence of epistasis and provide reasonable summaries of average selection intensities. We conclude that epistatic models are critical to understanding protein evolutionary dynamics, but epistasis might not be required for reasonable inference of selection pressure when averaging over time and sites.

Intragenic Conflict in Phylogenomic Data Sets

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.

How a Scorpion Toxin Selectively Captures a Prey Sodium Channel: The Molecular and Evolutionary Basis Uncovered

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
The growing resistance of insects to chemical pesticides is reducing the effectiveness of conventional methods for pest control and thus, the development of novel insecticidal agents is imperative. Scorpion toxins specific for insect voltage-gated sodium channels (Navs) have been considered as one of the most promising insecticide alternatives due to their host specificity, rapidly evoked toxicity, biodegradability, and the lack of resistance. However, they have not been developed for uses in agriculture and public health, mainly because of a limited understanding of their molecular and evolutionary basis controlling their phylogenetic selectivity. Here, we show that the traditionally defined insect-selective scorpion toxin LqhIT2 specifically captures a prey Nav through a conserved trapping apparatus comprising a three-residue-formed cavity and a structurally adjacent leucine. The former serves as a detector to recognize and bind a highly exposed channel residue conserved in insects and spiders, two major prey items for scorpions; and the latter subsequently seizes the “moving” voltage sensor via hydrophobic interactions to reduce activation energy for channel opening, demonstrating its action in an enzyme-like manner. Based on the established toxin-channel interaction model in combination with toxicity assay, we enlarged the toxic spectrum of LqhIT2 to spiders and certain other arthropods. Furthermore, we found that genetic background-dependent cavity shapes determine the species selectivity of LqhIT2-related toxins. We expect that the discovery of the trapping apparatus will improve our understanding of the evolution and design principle of Nav-targeted toxins from a diversity of arthropod predators and accelerate their uses in pest control.

Intragenic Conflict in Phylogenomic Data Sets

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.

How a Scorpion Toxin Selectively Captures a Prey Sodium Channel: The Molecular and Evolutionary Basis Uncovered

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
The growing resistance of insects to chemical pesticides is reducing the effectiveness of conventional methods for pest control and thus, the development of novel insecticidal agents is imperative. Scorpion toxins specific for insect voltage-gated sodium channels (Navs) have been considered as one of the most promising insecticide alternatives due to their host specificity, rapidly evoked toxicity, biodegradability, and the lack of resistance. However, they have not been developed for uses in agriculture and public health, mainly because of a limited understanding of their molecular and evolutionary basis controlling their phylogenetic selectivity. Here, we show that the traditionally defined insect-selective scorpion toxin LqhIT2 specifically captures a prey Nav through a conserved trapping apparatus comprising a three-residue-formed cavity and a structurally adjacent leucine. The former serves as a detector to recognize and bind a highly exposed channel residue conserved in insects and spiders, two major prey items for scorpions; and the latter subsequently seizes the “moving” voltage sensor via hydrophobic interactions to reduce activation energy for channel opening, demonstrating its action in an enzyme-like manner. Based on the established toxin-channel interaction model in combination with toxicity assay, we enlarged the toxic spectrum of LqhIT2 to spiders and certain other arthropods. Furthermore, we found that genetic background-dependent cavity shapes determine the species selectivity of LqhIT2-related toxins. We expect that the discovery of the trapping apparatus will improve our understanding of the evolution and design principle of Nav-targeted toxins from a diversity of arthropod predators and accelerate their uses in pest control.

Intragenic Conflict in Phylogenomic Data Sets

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.

Intragenic Conflict in Phylogenomic Data Sets

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.

Intragenic Conflict in Phylogenomic Data Sets

Tue, 08 Sep 2020 00:00:00 GMT

Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.

ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy

Fri, 04 Sep 2020 00:00:00 GMT

Abstract
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.

ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy

Fri, 04 Sep 2020 00:00:00 GMT

Abstract
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.

ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy

Fri, 04 Sep 2020 00:00:00 GMT

Abstract
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.

ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy

Fri, 04 Sep 2020 00:00:00 GMT

Abstract
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.

Comprehensive and Functional Analysis of Horizontal Gene Transfer Events in Diatoms

Thu, 16 Jul 2020 00:00:00 GMT

Abstract
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favorable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3–5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes was detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.

Comprehensive and Functional Analysis of Horizontal Gene Transfer Events in Diatoms

Thu, 16 Jul 2020 00:00:00 GMT

Abstract
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favorable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3–5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes was detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.

Comprehensive and Functional Analysis of Horizontal Gene Transfer Events in Diatoms

Thu, 16 Jul 2020 00:00:00 GMT

Abstract
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favorable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3–5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes was detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.

Comprehensive and Functional Analysis of Horizontal Gene Transfer Events in Diatoms

Thu, 16 Jul 2020 00:00:00 GMT

Abstract
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favorable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3–5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes was detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.

Comprehensive and Functional Analysis of Horizontal Gene Transfer Events in Diatoms

Thu, 16 Jul 2020 00:00:00 GMT

Abstract
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favorable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3–5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes was detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.

Frequent Retroviral Gene Co-option during the Evolution of Vertebrates

Wed, 15 Jul 2020 00:00:00 GMT

Abstract
Endogenous retroviruses are ubiquitous in the vertebrate genomes. On occasion, hosts recruited retroviral genes to mediate their own biological functions, a process formally known as co-option or exaptation. Much remains unknown about the extent of retroviral gene co-option in vertebrates, although more than ten retroviral gene co-option events have been documented. Here, we use a phylogenomic approach to analyze more than 700 vertebrate genomes to uncover retroviral gene co-option taking place during the evolution of vertebrates. We identify a total of 177 independent retroviral gene co-option events in vertebrates, a majority of which have not been reported previously. Among these retroviral gene co-option events, 93 and 84 involve gag and env genes, respectively. More than 78.0% (138 out of 177) of retroviral gene co-option occurred within mammals. The gag and env co-option events share a generally similar temporal pattern with less frequent retroviral gene co-option identified in the deep branches, suggesting that retroviral gene co-option might have not been maintained for very long time periods. Moreover, we find co-opted retroviral genes are subject to different selection pressure, implying potentially diverse cellular functionality. Our study provides a comprehensive picture of co-opted retroviral genes during the evolution of vertebrates and has implications in understanding the ancient evolution of vertebrate–retrovirus interaction.

Frequent Retroviral Gene Co-option during the Evolution of Vertebrates

Wed, 15 Jul 2020 00:00:00 GMT

Abstract
Endogenous retroviruses are ubiquitous in the vertebrate genomes. On occasion, hosts recruited retroviral genes to mediate their own biological functions, a process formally known as co-option or exaptation. Much remains unknown about the extent of retroviral gene co-option in vertebrates, although more than ten retroviral gene co-option events have been documented. Here, we use a phylogenomic approach to analyze more than 700 vertebrate genomes to uncover retroviral gene co-option taking place during the evolution of vertebrates. We identify a total of 177 independent retroviral gene co-option events in vertebrates, a majority of which have not been reported previously. Among these retroviral gene co-option events, 93 and 84 involve gag and env genes, respectively. More than 78.0% (138 out of 177) of retroviral gene co-option occurred within mammals. The gag and env co-option events share a generally similar temporal pattern with less frequent retroviral gene co-option identified in the deep branches, suggesting that retroviral gene co-option might have not been maintained for very long time periods. Moreover, we find co-opted retroviral genes are subject to different selection pressure, implying potentially diverse cellular functionality. Our study provides a comprehensive picture of co-opted retroviral genes during the evolution of vertebrates and has implications in understanding the ancient evolution of vertebrate–retrovirus interaction.

Frequent Retroviral Gene Co-option during the Evolution of Vertebrates

Wed, 15 Jul 2020 00:00:00 GMT

Abstract
Endogenous retroviruses are ubiquitous in the vertebrate genomes. On occasion, hosts recruited retroviral genes to mediate their own biological functions, a process formally known as co-option or exaptation. Much remains unknown about the extent of retroviral gene co-option in vertebrates, although more than ten retroviral gene co-option events have been documented. Here, we use a phylogenomic approach to analyze more than 700 vertebrate genomes to uncover retroviral gene co-option taking place during the evolution of vertebrates. We identify a total of 177 independent retroviral gene co-option events in vertebrates, a majority of which have not been reported previously. Among these retroviral gene co-option events, 93 and 84 involve gag and env genes, respectively. More than 78.0% (138 out of 177) of retroviral gene co-option occurred within mammals. The gag and env co-option events share a generally similar temporal pattern with less frequent retroviral gene co-option identified in the deep branches, suggesting that retroviral gene co-option might have not been maintained for very long time periods. Moreover, we find co-opted retroviral genes are subject to different selection pressure, implying potentially diverse cellular functionality. Our study provides a comprehensive picture of co-opted retroviral genes during the evolution of vertebrates and has implications in understanding the ancient evolution of vertebrate–retrovirus interaction.

Frequent Retroviral Gene Co-option during the Evolution of Vertebrates

Wed, 15 Jul 2020 00:00:00 GMT

Abstract
Endogenous retroviruses are ubiquitous in the vertebrate genomes. On occasion, hosts recruited retroviral genes to mediate their own biological functions, a process formally known as co-option or exaptation. Much remains unknown about the extent of retroviral gene co-option in vertebrates, although more than ten retroviral gene co-option events have been documented. Here, we use a phylogenomic approach to analyze more than 700 vertebrate genomes to uncover retroviral gene co-option taking place during the evolution of vertebrates. We identify a total of 177 independent retroviral gene co-option events in vertebrates, a majority of which have not been reported previously. Among these retroviral gene co-option events, 93 and 84 involve gag and env genes, respectively. More than 78.0% (138 out of 177) of retroviral gene co-option occurred within mammals. The gag and env co-option events share a generally similar temporal pattern with less frequent retroviral gene co-option identified in the deep branches, suggesting that retroviral gene co-option might have not been maintained for very long time periods. Moreover, we find co-opted retroviral genes are subject to different selection pressure, implying potentially diverse cellular functionality. Our study provides a comprehensive picture of co-opted retroviral genes during the evolution of vertebrates and has implications in understanding the ancient evolution of vertebrate–retrovirus interaction.

Frequent Retroviral Gene Co-option during the Evolution of Vertebrates

Wed, 15 Jul 2020 00:00:00 GMT

Abstract
Endogenous retroviruses are ubiquitous in the vertebrate genomes. On occasion, hosts recruited retroviral genes to mediate their own biological functions, a process formally known as co-option or exaptation. Much remains unknown about the extent of retroviral gene co-option in vertebrates, although more than ten retroviral gene co-option events have been documented. Here, we use a phylogenomic approach to analyze more than 700 vertebrate genomes to uncover retroviral gene co-option taking place during the evolution of vertebrates. We identify a total of 177 independent retroviral gene co-option events in vertebrates, a majority of which have not been reported previously. Among these retroviral gene co-option events, 93 and 84 involve gag and env genes, respectively. More than 78.0% (138 out of 177) of retroviral gene co-option occurred within mammals. The gag and env co-option events share a generally similar temporal pattern with less frequent retroviral gene co-option identified in the deep branches, suggesting that retroviral gene co-option might have not been maintained for very long time periods. Moreover, we find co-opted retroviral genes are subject to different selection pressure, implying potentially diverse cellular functionality. Our study provides a comprehensive picture of co-opted retroviral genes during the evolution of vertebrates and has implications in understanding the ancient evolution of vertebrate–retrovirus interaction.

Asterid Phylogenomics/Phylotranscriptomics Uncover Morphological Evolutionary Histories and Support Phylogenetic Placement for Numerous Whole-Genome Duplications

Sat, 11 Jul 2020 00:00:00 GMT

Abstract
Asterids are one of the most successful angiosperm lineages, exhibiting extensive morphological diversity and including a number of important crops. Despite their biological prominence and value to humans, the deep asterid phylogeny has not been fully resolved, and the evolutionary landscape underlying their radiation remains unknown. To resolve the asterid phylogeny, we sequenced 213 transcriptomes/genomes and combined them with other data sets, representing all accepted orders and nearly all families of asterids. We show fully supported monophyly of asterids, Berberidopsidales as sister to asterids, monophyly of all orders except Icacinales, Aquifoliales, and Bruniales, and monophyly of all families except Icacinaceae and Ehretiaceae. Novel taxon placements benefited from the expanded sampling with living collections from botanical gardens, resolving hitherto uncertain relationships. The remaining ambiguous placements here are likely due to limited sampling and could be addressed in the future with relevant additional taxa. Using our well-resolved phylogeny as reference, divergence time estimates support an Aptian (Early Cretaceous) origin of asterids and the origin of all orders before the Cretaceous–Paleogene boundary. Ancestral state reconstruction at the family level suggests that the asterid ancestor was a woody terrestrial plant with simple leaves, bisexual, and actinomorphic flowers with free petals and free anthers, a superior ovary with a style, and drupaceous fruits. Whole-genome duplication (WGD) analyses provide strong evidence for 33 WGDs in asterids and one in Berberidopsidales, including four suprafamilial and seven familial/subfamilial WGDs. Our results advance the understanding of asterid phylogeny and provide numerous novel evolutionary insights into their diversification and morphological evolution.

Asterid Phylogenomics/Phylotranscriptomics Uncover Morphological Evolutionary Histories and Support Phylogenetic Placement for Numerous Whole-Genome Duplications

Sat, 11 Jul 2020 00:00:00 GMT

Abstract
Asterids are one of the most successful angiosperm lineages, exhibiting extensive morphological diversity and including a number of important crops. Despite their biological prominence and value to humans, the deep asterid phylogeny has not been fully resolved, and the evolutionary landscape underlying their radiation remains unknown. To resolve the asterid phylogeny, we sequenced 213 transcriptomes/genomes and combined them with other data sets, representing all accepted orders and nearly all families of asterids. We show fully supported monophyly of asterids, Berberidopsidales as sister to asterids, monophyly of all orders except Icacinales, Aquifoliales, and Bruniales, and monophyly of all families except Icacinaceae and Ehretiaceae. Novel taxon placements benefited from the expanded sampling with living collections from botanical gardens, resolving hitherto uncertain relationships. The remaining ambiguous placements here are likely due to limited sampling and could be addressed in the future with relevant additional taxa. Using our well-resolved phylogeny as reference, divergence time estimates support an Aptian (Early Cretaceous) origin of asterids and the origin of all orders before the Cretaceous–Paleogene boundary. Ancestral state reconstruction at the family level suggests that the asterid ancestor was a woody terrestrial plant with simple leaves, bisexual, and actinomorphic flowers with free petals and free anthers, a superior ovary with a style, and drupaceous fruits. Whole-genome duplication (WGD) analyses provide strong evidence for 33 WGDs in asterids and one in Berberidopsidales, including four suprafamilial and seven familial/subfamilial WGDs. Our results advance the understanding of asterid phylogeny and provide numerous novel evolutionary insights into their diversification and morphological evolution.

Asterid Phylogenomics/Phylotranscriptomics Uncover Morphological Evolutionary Histories and Support Phylogenetic Placement for Numerous Whole-Genome Duplications

Sat, 11 Jul 2020 00:00:00 GMT

Abstract
Asterids are one of the most successful angiosperm lineages, exhibiting extensive morphological diversity and including a number of important crops. Despite their biological prominence and value to humans, the deep asterid phylogeny has not been fully resolved, and the evolutionary landscape underlying their radiation remains unknown. To resolve the asterid phylogeny, we sequenced 213 transcriptomes/genomes and combined them with other data sets, representing all accepted orders and nearly all families of asterids. We show fully supported monophyly of asterids, Berberidopsidales as sister to asterids, monophyly of all orders except Icacinales, Aquifoliales, and Bruniales, and monophyly of all families except Icacinaceae and Ehretiaceae. Novel taxon placements benefited from the expanded sampling with living collections from botanical gardens, resolving hitherto uncertain relationships. The remaining ambiguous placements here are likely due to limited sampling and could be addressed in the future with relevant additional taxa. Using our well-resolved phylogeny as reference, divergence time estimates support an Aptian (Early Cretaceous) origin of asterids and the origin of all orders before the Cretaceous–Paleogene boundary. Ancestral state reconstruction at the family level suggests that the asterid ancestor was a woody terrestrial plant with simple leaves, bisexual, and actinomorphic flowers with free petals and free anthers, a superior ovary with a style, and drupaceous fruits. Whole-genome duplication (WGD) analyses provide strong evidence for 33 WGDs in asterids and one in Berberidopsidales, including four suprafamilial and seven familial/subfamilial WGDs. Our results advance the understanding of asterid phylogeny and provide numerous novel evolutionary insights into their diversification and morphological evolution.

Higher Germline Mutagenesis of Genes with Stronger Testis Expressions Refutes the Transcriptional Scanning Hypothesis

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
Why are more genes expressed in the testis than in any other organ in mammals? The recently proposed transcriptional scanning hypothesis posits that transcription alleviates mutagenesis through transcription-coupled repair so has been selected in the testis to modulate the germline mutation rate in a gene-specific manner. Here, we show that this hypothesis is theoretically untenable because the selection would be too weak to have an effect in mammals. Furthermore, the analysis purported to support the hypothesis did not control known confounding factors and inappropriately excluded genes with no observed de novo mutations. After remedying these problems, we find the human germline mutation rate of a gene to rise with its testis expression level. This trend also exists for inferred coding strand-originated mutations, suggesting that it arises from transcription-associated mutagenesis. Furthermore, the testis expression level of a gene robustly correlates with its overall expression in other organs, nullifying the need to explain the testis silencing of a minority of genes by adaptive germline mutagenesis. Taken together, our results demonstrate that human testis transcription increases the germline mutation rate, rejecting the transcriptional scanning hypothesis of extensive gene expressions in the mammalian testis.

A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.

Higher Germline Mutagenesis of Genes with Stronger Testis Expressions Refutes the Transcriptional Scanning Hypothesis

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
Why are more genes expressed in the testis than in any other organ in mammals? The recently proposed transcriptional scanning hypothesis posits that transcription alleviates mutagenesis through transcription-coupled repair so has been selected in the testis to modulate the germline mutation rate in a gene-specific manner. Here, we show that this hypothesis is theoretically untenable because the selection would be too weak to have an effect in mammals. Furthermore, the analysis purported to support the hypothesis did not control known confounding factors and inappropriately excluded genes with no observed de novo mutations. After remedying these problems, we find the human germline mutation rate of a gene to rise with its testis expression level. This trend also exists for inferred coding strand-originated mutations, suggesting that it arises from transcription-associated mutagenesis. Furthermore, the testis expression level of a gene robustly correlates with its overall expression in other organs, nullifying the need to explain the testis silencing of a minority of genes by adaptive germline mutagenesis. Taken together, our results demonstrate that human testis transcription increases the germline mutation rate, rejecting the transcriptional scanning hypothesis of extensive gene expressions in the mammalian testis.

A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.

Higher Germline Mutagenesis of Genes with Stronger Testis Expressions Refutes the Transcriptional Scanning Hypothesis

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
Why are more genes expressed in the testis than in any other organ in mammals? The recently proposed transcriptional scanning hypothesis posits that transcription alleviates mutagenesis through transcription-coupled repair so has been selected in the testis to modulate the germline mutation rate in a gene-specific manner. Here, we show that this hypothesis is theoretically untenable because the selection would be too weak to have an effect in mammals. Furthermore, the analysis purported to support the hypothesis did not control known confounding factors and inappropriately excluded genes with no observed de novo mutations. After remedying these problems, we find the human germline mutation rate of a gene to rise with its testis expression level. This trend also exists for inferred coding strand-originated mutations, suggesting that it arises from transcription-associated mutagenesis. Furthermore, the testis expression level of a gene robustly correlates with its overall expression in other organs, nullifying the need to explain the testis silencing of a minority of genes by adaptive germline mutagenesis. Taken together, our results demonstrate that human testis transcription increases the germline mutation rate, rejecting the transcriptional scanning hypothesis of extensive gene expressions in the mammalian testis.

A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.

Higher Germline Mutagenesis of Genes with Stronger Testis Expressions Refutes the Transcriptional Scanning Hypothesis

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
Why are more genes expressed in the testis than in any other organ in mammals? The recently proposed transcriptional scanning hypothesis posits that transcription alleviates mutagenesis through transcription-coupled repair so has been selected in the testis to modulate the germline mutation rate in a gene-specific manner. Here, we show that this hypothesis is theoretically untenable because the selection would be too weak to have an effect in mammals. Furthermore, the analysis purported to support the hypothesis did not control known confounding factors and inappropriately excluded genes with no observed de novo mutations. After remedying these problems, we find the human germline mutation rate of a gene to rise with its testis expression level. This trend also exists for inferred coding strand-originated mutations, suggesting that it arises from transcription-associated mutagenesis. Furthermore, the testis expression level of a gene robustly correlates with its overall expression in other organs, nullifying the need to explain the testis silencing of a minority of genes by adaptive germline mutagenesis. Taken together, our results demonstrate that human testis transcription increases the germline mutation rate, rejecting the transcriptional scanning hypothesis of extensive gene expressions in the mammalian testis.

A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.

Higher Germline Mutagenesis of Genes with Stronger Testis Expressions Refutes the Transcriptional Scanning Hypothesis

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
Why are more genes expressed in the testis than in any other organ in mammals? The recently proposed transcriptional scanning hypothesis posits that transcription alleviates mutagenesis through transcription-coupled repair so has been selected in the testis to modulate the germline mutation rate in a gene-specific manner. Here, we show that this hypothesis is theoretically untenable because the selection would be too weak to have an effect in mammals. Furthermore, the analysis purported to support the hypothesis did not control known confounding factors and inappropriately excluded genes with no observed de novo mutations. After remedying these problems, we find the human germline mutation rate of a gene to rise with its testis expression level. This trend also exists for inferred coding strand-originated mutations, suggesting that it arises from transcription-associated mutagenesis. Furthermore, the testis expression level of a gene robustly correlates with its overall expression in other organs, nullifying the need to explain the testis silencing of a minority of genes by adaptive germline mutagenesis. Taken together, our results demonstrate that human testis transcription increases the germline mutation rate, rejecting the transcriptional scanning hypothesis of extensive gene expressions in the mammalian testis.

A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model

Wed, 08 Jul 2020 00:00:00 GMT

Abstract
We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.

Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
Phylogenetic methods can use the sampling times of molecular sequence data to calibrate the molecular clock, enabling the estimation of evolutionary rates and timescales for rapidly evolving pathogens and data sets containing ancient DNA samples. A key aspect of such calibrations is whether a sufficient amount of molecular evolution has occurred over the sampling time window, that is, whether the data can be treated as having come from a measurably evolving population. Here, we investigate the performance of a fully Bayesian evaluation of temporal signal (BETS) in sequence data. The method involves comparing the fit to the data of two models: a model in which the data are accompanied by the actual (heterochronous) sampling times, and a model in which the samples are constrained to be contemporaneous (isochronous). We conducted simulations under a wide range of conditions to demonstrate that BETS accurately classifies data sets according to whether they contain temporal signal or not, even when there is substantial among-lineage rate variation. We explore the behavior of this classification in analyses of five empirical data sets: modern samples of A/H1N1 influenza virus, the bacterium Bordetella pertussis, coronaviruses from mammalian hosts, ancient DNA from Hepatitis B virus, and mitochondrial genomes of dog species. Our results indicate that BETS is an effective alternative to other tests of temporal signal. In particular, this method has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.

Detecting Signatures of Positive Selection against a Backdrop of Compensatory Processes

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
There are known limitations in methods of detecting positive selection. Common methods do not enable differentiation between positive selection and compensatory covariation, a major limitation. Further, the traditional method of calculating the ratio of nonsynonymous to synonymous substitutions (dN/dS) does not take into account the 3D structure of biomacromolecules nor differences between amino acids. It also does not account for saturation of synonymous mutations (dS) over long evolutionary time that renders codon-based methods ineffective for older divergences. This work aims to address these shortcomings for detecting positive selection through the development of a statistical model that examines clusters of substitutions in clusters of variable radii. Additionally, it uses a parametric bootstrapping approach to differentiate positive selection from compensatory processes. A previously reported case of positive selection in the leptin protein of primates was reexamined using this methodology.

Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
Phylogenetic methods can use the sampling times of molecular sequence data to calibrate the molecular clock, enabling the estimation of evolutionary rates and timescales for rapidly evolving pathogens and data sets containing ancient DNA samples. A key aspect of such calibrations is whether a sufficient amount of molecular evolution has occurred over the sampling time window, that is, whether the data can be treated as having come from a measurably evolving population. Here, we investigate the performance of a fully Bayesian evaluation of temporal signal (BETS) in sequence data. The method involves comparing the fit to the data of two models: a model in which the data are accompanied by the actual (heterochronous) sampling times, and a model in which the samples are constrained to be contemporaneous (isochronous). We conducted simulations under a wide range of conditions to demonstrate that BETS accurately classifies data sets according to whether they contain temporal signal or not, even when there is substantial among-lineage rate variation. We explore the behavior of this classification in analyses of five empirical data sets: modern samples of A/H1N1 influenza virus, the bacterium Bordetella pertussis, coronaviruses from mammalian hosts, ancient DNA from Hepatitis B virus, and mitochondrial genomes of dog species. Our results indicate that BETS is an effective alternative to other tests of temporal signal. In particular, this method has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.

Detecting Signatures of Positive Selection against a Backdrop of Compensatory Processes

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
There are known limitations in methods of detecting positive selection. Common methods do not enable differentiation between positive selection and compensatory covariation, a major limitation. Further, the traditional method of calculating the ratio of nonsynonymous to synonymous substitutions (dN/dS) does not take into account the 3D structure of biomacromolecules nor differences between amino acids. It also does not account for saturation of synonymous mutations (dS) over long evolutionary time that renders codon-based methods ineffective for older divergences. This work aims to address these shortcomings for detecting positive selection through the development of a statistical model that examines clusters of substitutions in clusters of variable radii. Additionally, it uses a parametric bootstrapping approach to differentiate positive selection from compensatory processes. A previously reported case of positive selection in the leptin protein of primates was reexamined using this methodology.

Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
Phylogenetic methods can use the sampling times of molecular sequence data to calibrate the molecular clock, enabling the estimation of evolutionary rates and timescales for rapidly evolving pathogens and data sets containing ancient DNA samples. A key aspect of such calibrations is whether a sufficient amount of molecular evolution has occurred over the sampling time window, that is, whether the data can be treated as having come from a measurably evolving population. Here, we investigate the performance of a fully Bayesian evaluation of temporal signal (BETS) in sequence data. The method involves comparing the fit to the data of two models: a model in which the data are accompanied by the actual (heterochronous) sampling times, and a model in which the samples are constrained to be contemporaneous (isochronous). We conducted simulations under a wide range of conditions to demonstrate that BETS accurately classifies data sets according to whether they contain temporal signal or not, even when there is substantial among-lineage rate variation. We explore the behavior of this classification in analyses of five empirical data sets: modern samples of A/H1N1 influenza virus, the bacterium Bordetella pertussis, coronaviruses from mammalian hosts, ancient DNA from Hepatitis B virus, and mitochondrial genomes of dog species. Our results indicate that BETS is an effective alternative to other tests of temporal signal. In particular, this method has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.

Detecting Signatures of Positive Selection against a Backdrop of Compensatory Processes

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
There are known limitations in methods of detecting positive selection. Common methods do not enable differentiation between positive selection and compensatory covariation, a major limitation. Further, the traditional method of calculating the ratio of nonsynonymous to synonymous substitutions (dN/dS) does not take into account the 3D structure of biomacromolecules nor differences between amino acids. It also does not account for saturation of synonymous mutations (dS) over long evolutionary time that renders codon-based methods ineffective for older divergences. This work aims to address these shortcomings for detecting positive selection through the development of a statistical model that examines clusters of substitutions in clusters of variable radii. Additionally, it uses a parametric bootstrapping approach to differentiate positive selection from compensatory processes. A previously reported case of positive selection in the leptin protein of primates was reexamined using this methodology.

Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
Phylogenetic methods can use the sampling times of molecular sequence data to calibrate the molecular clock, enabling the estimation of evolutionary rates and timescales for rapidly evolving pathogens and data sets containing ancient DNA samples. A key aspect of such calibrations is whether a sufficient amount of molecular evolution has occurred over the sampling time window, that is, whether the data can be treated as having come from a measurably evolving population. Here, we investigate the performance of a fully Bayesian evaluation of temporal signal (BETS) in sequence data. The method involves comparing the fit to the data of two models: a model in which the data are accompanied by the actual (heterochronous) sampling times, and a model in which the samples are constrained to be contemporaneous (isochronous). We conducted simulations under a wide range of conditions to demonstrate that BETS accurately classifies data sets according to whether they contain temporal signal or not, even when there is substantial among-lineage rate variation. We explore the behavior of this classification in analyses of five empirical data sets: modern samples of A/H1N1 influenza virus, the bacterium Bordetella pertussis, coronaviruses from mammalian hosts, ancient DNA from Hepatitis B virus, and mitochondrial genomes of dog species. Our results indicate that BETS is an effective alternative to other tests of temporal signal. In particular, this method has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.

Detecting Signatures of Positive Selection against a Backdrop of Compensatory Processes

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
There are known limitations in methods of detecting positive selection. Common methods do not enable differentiation between positive selection and compensatory covariation, a major limitation. Further, the traditional method of calculating the ratio of nonsynonymous to synonymous substitutions (dN/dS) does not take into account the 3D structure of biomacromolecules nor differences between amino acids. It also does not account for saturation of synonymous mutations (dS) over long evolutionary time that renders codon-based methods ineffective for older divergences. This work aims to address these shortcomings for detecting positive selection through the development of a statistical model that examines clusters of substitutions in clusters of variable radii. Additionally, it uses a parametric bootstrapping approach to differentiate positive selection from compensatory processes. A previously reported case of positive selection in the leptin protein of primates was reexamined using this methodology.

Detecting Signatures of Positive Selection against a Backdrop of Compensatory Processes

Mon, 06 Jul 2020 00:00:00 GMT

Abstract
There are known limitations in methods of detecting positive selection. Common methods do not enable differentiation between positive selection and compensatory covariation, a major limitation. Further, the traditional method of calculating the ratio of nonsynonymous to synonymous substitutions (dN/dS) does not take into account the 3D structure of biomacromolecules nor differences between amino acids. It also does not account for saturation of synonymous mutations (dS) over long evolutionary time that renders codon-based methods ineffective for older divergences. This work aims to address these shortcomings for detecting positive selection through the development of a statistical model that examines clusters of substitutions in clusters of variable radii. Additionally, it uses a parametric bootstrapping approach to differentiate positive selection from compensatory processes. A previously reported case of positive selection in the leptin protein of primates was reexamined using this methodology.

Broccoli: Combining Phylogenetic and Network Analyses for Orthology Assignment

Tue, 30 Jun 2020 00:00:00 GMT

Abstract
Orthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artifacts. In this article, we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups, and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultrafast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark data sets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies. Broccoli is freely available at https://github.com/rderelle/Broccoli.

Broccoli: Combining Phylogenetic and Network Analyses for Orthology Assignment

Tue, 30 Jun 2020 00:00:00 GMT

Abstract
Orthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artifacts. In this article, we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups, and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultrafast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark data sets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies. Broccoli is freely available at https://github.com/rderelle/Broccoli.

Broccoli: Combining Phylogenetic and Network Analyses for Orthology Assignment

Tue, 30 Jun 2020 00:00:00 GMT

Abstract
Orthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artifacts. In this article, we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups, and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultrafast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark data sets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies. Broccoli is freely available at https://github.com/rderelle/Broccoli.

Recent Common Origin, Reduced Population Size, and Marked Admixture Have Shaped European Roma Genomes

Fri, 26 Jun 2020 00:00:00 GMT

Abstract
The Roma Diaspora—traditionally known as Gypsies—remains among the least explored population migratory events in historical times. It involved the migration of Roma ancestors out-of-India through the plateaus of Western Asia ultimately reaching Europe. The demographic effects of the Diaspora—bottlenecks, endogamy, and gene flow—might have left marked molecular traces in the Roma genomes. Here, we analyze the whole-genome sequence of 46 Roma individuals pertaining to four migrant groups in six European countries. Our analyses revealed a strong, early founder effect followed by a drastic reduction of ∼44% in effective population size. The Roma common ancestors split from the Punjabi population, from Northwest India, some generations before the Diaspora started, <2,000 years ago. The initial bottleneck and subsequent endogamy are revealed by the occurrence of extensive runs of homozygosity and identity-by-descent segments in all Roma populations. Furthermore, we provide evidence of gene flow from Armenian and Anatolian groups in present-day Roma, although the primary contribution to Roma gene pool comes from non-Roma Europeans, which accounts for >50% of their genomes. The linguistic and historical differentiation of Roma in migrant groups is confirmed by the differential proportion, but not a differential source, of European admixture in the Roma groups, which shows a westward cline. In the present study, we found that despite the strong admixture Roma had in their diaspora, the signature of the initial bottleneck and the subsequent endogamy is still present in Roma genomes.

Interspecific Gene Flow and the Evolution of Specialization in Black and White Rhinoceros

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Africa’s black (Diceros bicornis) and white (Ceratotherium simum) rhinoceros are closely related sister-taxa that evolved highly divergent obligate browsing and grazing feeding strategies. Although their precursor species Diceros praecox and Ceratotherium mauritanicum appear in the fossil record ∼5.2 Ma, by 4 Ma both were still mixed feeders, and were even spatiotemporally sympatric at several Pliocene sites in what is today Africa’s Rift Valley. Here, we ask whether or not D. praecox and C. mauritanicum were reproductively isolated when they came into Pliocene secondary contact. We sequenced and de novo assembled the first annotated black rhinoceros reference genome and compared it with available genomes of other black and white rhinoceros. We show that ancestral gene flow between D. praecox and C. mauritanicum ceased sometime between 3.3 and 4.1 Ma, despite conventional methods for the detection of gene flow from whole genome data returning false positive signatures of recent interspecific migration due to incomplete lineage sorting. We propose that ongoing Pliocene genetic exchange, for up to 2 My after initial divergence, could have potentially hindered the development of obligate feeding strategies until both species were fully reproductively isolated, but that the more severe and shifting paleoclimate of the early Pleistocene was likely the ultimate driver of ecological specialization in African rhinoceros.

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.

Interspecific Gene Flow and the Evolution of Specialization in Black and White Rhinoceros

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Africa’s black (Diceros bicornis) and white (Ceratotherium simum) rhinoceros are closely related sister-taxa that evolved highly divergent obligate browsing and grazing feeding strategies. Although their precursor species Diceros praecox and Ceratotherium mauritanicum appear in the fossil record ∼5.2 Ma, by 4 Ma both were still mixed feeders, and were even spatiotemporally sympatric at several Pliocene sites in what is today Africa’s Rift Valley. Here, we ask whether or not D. praecox and C. mauritanicum were reproductively isolated when they came into Pliocene secondary contact. We sequenced and de novo assembled the first annotated black rhinoceros reference genome and compared it with available genomes of other black and white rhinoceros. We show that ancestral gene flow between D. praecox and C. mauritanicum ceased sometime between 3.3 and 4.1 Ma, despite conventional methods for the detection of gene flow from whole genome data returning false positive signatures of recent interspecific migration due to incomplete lineage sorting. We propose that ongoing Pliocene genetic exchange, for up to 2 My after initial divergence, could have potentially hindered the development of obligate feeding strategies until both species were fully reproductively isolated, but that the more severe and shifting paleoclimate of the early Pleistocene was likely the ultimate driver of ecological specialization in African rhinoceros.

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.

Interspecific Gene Flow and the Evolution of Specialization in Black and White Rhinoceros

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Africa’s black (Diceros bicornis) and white (Ceratotherium simum) rhinoceros are closely related sister-taxa that evolved highly divergent obligate browsing and grazing feeding strategies. Although their precursor species Diceros praecox and Ceratotherium mauritanicum appear in the fossil record ∼5.2 Ma, by 4 Ma both were still mixed feeders, and were even spatiotemporally sympatric at several Pliocene sites in what is today Africa’s Rift Valley. Here, we ask whether or not D. praecox and C. mauritanicum were reproductively isolated when they came into Pliocene secondary contact. We sequenced and de novo assembled the first annotated black rhinoceros reference genome and compared it with available genomes of other black and white rhinoceros. We show that ancestral gene flow between D. praecox and C. mauritanicum ceased sometime between 3.3 and 4.1 Ma, despite conventional methods for the detection of gene flow from whole genome data returning false positive signatures of recent interspecific migration due to incomplete lineage sorting. We propose that ongoing Pliocene genetic exchange, for up to 2 My after initial divergence, could have potentially hindered the development of obligate feeding strategies until both species were fully reproductively isolated, but that the more severe and shifting paleoclimate of the early Pleistocene was likely the ultimate driver of ecological specialization in African rhinoceros.

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning

Thu, 25 Jun 2020 00:00:00 GMT

Abstract
Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.

A Genomic Cluster Containing Novel and Conserved Genes is Associated with Cichlid Fish Dental Developmental Convergence

Wed, 24 Jun 2020 00:00:00 GMT

Abstract
The two toothed jaws of cichlid fishes provide textbook examples of convergent evolution. Tooth phenotypes such as enlarged molar-like teeth used to process hard-shelled mollusks have evolved numerous times independently during cichlid diversification. Although the ecological benefit of molar-like teeth to crush prey is known, it is unclear whether the same molecular mechanisms underlie these convergent traits. To identify genes involved in the evolution and development of enlarged cichlid teeth, we performed RNA-seq on the serially homologous-toothed oral and pharyngeal jaws as well as the fourth toothless gill arch of Astatoreochromis alluaudi. We identified 27 genes that are highly upregulated on both tooth-bearing jaws compared with the toothless gill arch. Most of these genes have never been reported to play a role in tooth formation. Two of these genes (unk, rpfA) are not found in other vertebrate genomes but are present in all cichlid genomes. They also cluster genomically with two other highly expressed tooth genes (odam, scpp5) that exhibit conserved expression during vertebrate odontogenesis. Unk and rpfA were confirmed via in situ hybridization to be expressed in developing teeth of Astatotilapia burtoni. We then examined expression of the cluster’s four genes in six evolutionarily independent and phylogenetically disparate cichlid species pairs each with a large- and a small-toothed species. Odam and unk commonly and scpp5 and rpfA always showed higher expression in larger toothed cichlid jaws. Convergent trophic adaptations across cichlid diversity are associated with the repeated developmental deployment of this genomic cluster containing conserved and novel cichlid-specific genes.

Synteny-Guided Resolution of Gene Trees Clarifies the Functional Impact of Whole-Genome Duplications

Thu, 18 Jun 2020 00:00:00 GMT

Abstract
Whole-genome duplications (WGDs) have major impacts on the evolution of species, as they produce new gene copies contributing substantially to adaptation, isolation, phenotypic robustness, and evolvability. They result in large, complex gene families with recurrent gene losses in descendant species that sequence-based phylogenetic methods fail to reconstruct accurately. As a result, orthologs and paralogs are difficult to identify reliably in WGD-descended species, which hinders the exploration of functional consequences of WGDs. Here, we present Synteny-guided CORrection of Paralogies and Orthologies (SCORPiOs), a novel method to reconstruct gene phylogenies in the context of a known WGD event. WGDs generate large duplicated syntenic regions, which SCORPiOs systematically leverages as a complement to sequence evolution to infer the evolutionary history of genes. We applied SCORPiOs to the 320-My-old WGD at the origin of teleost fish. We find that almost one in four teleost gene phylogenies in the Ensembl database (3,394) are inconsistent with their syntenic contexts. For 70% of these gene families (2,387), we were able to propose an improved phylogenetic tree consistent with both the molecular substitution distances and the local syntenic information. We show that these synteny-guided phylogenies are more congruent with the species tree, with sequence evolution and with expected expression conservation patterns than those produced by state-of-the-art methods. Finally, we show that synteny-guided gene trees emphasize contributions of WGD paralogs to evolutionary innovations in the teleost clade.

Synteny-Guided Resolution of Gene Trees Clarifies the Functional Impact of Whole-Genome Duplications

Thu, 18 Jun 2020 00:00:00 GMT

Abstract
Whole-genome duplications (WGDs) have major impacts on the evolution of species, as they produce new gene copies contributing substantially to adaptation, isolation, phenotypic robustness, and evolvability. They result in large, complex gene families with recurrent gene losses in descendant species that sequence-based phylogenetic methods fail to reconstruct accurately. As a result, orthologs and paralogs are difficult to identify reliably in WGD-descended species, which hinders the exploration of functional consequences of WGDs. Here, we present Synteny-guided CORrection of Paralogies and Orthologies (SCORPiOs), a novel method to reconstruct gene phylogenies in the context of a known WGD event. WGDs generate large duplicated syntenic regions, which SCORPiOs systematically leverages as a complement to sequence evolution to infer the evolutionary history of genes. We applied SCORPiOs to the 320-My-old WGD at the origin of teleost fish. We find that almost one in four teleost gene phylogenies in the Ensembl database (3,394) are inconsistent with their syntenic contexts. For 70% of these gene families (2,387), we were able to propose an improved phylogenetic tree consistent with both the molecular substitution distances and the local syntenic information. We show that these synteny-guided phylogenies are more congruent with the species tree, with sequence evolution and with expected expression conservation patterns than those produced by state-of-the-art methods. Finally, we show that synteny-guided gene trees emphasize contributions of WGD paralogs to evolutionary innovations in the teleost clade.

Variable Spontaneous Mutation and Loss of Heterozygosity among Heterozygous Genomes in Yeast

Wed, 17 Jun 2020 00:00:00 GMT

Abstract
Mutation and recombination are the primary sources of genetic variation. To better understand the evolution of genetic variation, it is crucial to comprehensively investigate the processes involving mutation accumulation and recombination. In this study, we performed mutation accumulation experiments on four heterozygous diploid yeast species in the Saccharomycodaceae family to determine spontaneous mutation rates, mutation spectra, and losses of heterozygosity (LOH). We observed substantial variation in mutation rates and mutation spectra. We also observed high LOH rates (1.65–11.07×10−6 events per heterozygous site per cell division). Biases in spontaneous mutation and LOH together with selection ultimately shape the variable genome-wide nucleotide landscape in yeast species.

Variable Spontaneous Mutation and Loss of Heterozygosity among Heterozygous Genomes in Yeast

Wed, 17 Jun 2020 00:00:00 GMT

Abstract
Mutation and recombination are the primary sources of genetic variation. To better understand the evolution of genetic variation, it is crucial to comprehensively investigate the processes involving mutation accumulation and recombination. In this study, we performed mutation accumulation experiments on four heterozygous diploid yeast species in the Saccharomycodaceae family to determine spontaneous mutation rates, mutation spectra, and losses of heterozygosity (LOH). We observed substantial variation in mutation rates and mutation spectra. We also observed high LOH rates (1.65–11.07×10−6 events per heterozygous site per cell division). Biases in spontaneous mutation and LOH together with selection ultimately shape the variable genome-wide nucleotide landscape in yeast species.

Variable Spontaneous Mutation and Loss of Heterozygosity among Heterozygous Genomes in Yeast

Wed, 17 Jun 2020 00:00:00 GMT

Abstract
Mutation and recombination are the primary sources of genetic variation. To better understand the evolution of genetic variation, it is crucial to comprehensively investigate the processes involving mutation accumulation and recombination. In this study, we performed mutation accumulation experiments on four heterozygous diploid yeast species in the Saccharomycodaceae family to determine spontaneous mutation rates, mutation spectra, and losses of heterozygosity (LOH). We observed substantial variation in mutation rates and mutation spectra. We also observed high LOH rates (1.65–11.07×10−6 events per heterozygous site per cell division). Biases in spontaneous mutation and LOH together with selection ultimately shape the variable genome-wide nucleotide landscape in yeast species.

Variable Spontaneous Mutation and Loss of Heterozygosity among Heterozygous Genomes in Yeast

Wed, 17 Jun 2020 00:00:00 GMT

Abstract
Mutation and recombination are the primary sources of genetic variation. To better understand the evolution of genetic variation, it is crucial to comprehensively investigate the processes involving mutation accumulation and recombination. In this study, we performed mutation accumulation experiments on four heterozygous diploid yeast species in the Saccharomycodaceae family to determine spontaneous mutation rates, mutation spectra, and losses of heterozygosity (LOH). We observed substantial variation in mutation rates and mutation spectra. We also observed high LOH rates (1.65–11.07×10−6 events per heterozygous site per cell division). Biases in spontaneous mutation and LOH together with selection ultimately shape the variable genome-wide nucleotide landscape in yeast species.

Variable Spontaneous Mutation and Loss of Heterozygosity among Heterozygous Genomes in Yeast

Wed, 17 Jun 2020 00:00:00 GMT

Abstract
Mutation and recombination are the primary sources of genetic variation. To better understand the evolution of genetic variation, it is crucial to comprehensively investigate the processes involving mutation accumulation and recombination. In this study, we performed mutation accumulation experiments on four heterozygous diploid yeast species in the Saccharomycodaceae family to determine spontaneous mutation rates, mutation spectra, and losses of heterozygosity (LOH). We observed substantial variation in mutation rates and mutation spectra. We also observed high LOH rates (1.65–11.07×10−6 events per heterozygous site per cell division). Biases in spontaneous mutation and LOH together with selection ultimately shape the variable genome-wide nucleotide landscape in yeast species.

Lateral Gene Transfer Acts As an Evolutionary Shortcut to Efficient C4 Biochemistry

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
The adaptation of proteins for novel functions often requires changes in their kinetics via amino acid replacement. This process can require multiple mutations, and therefore extended periods of selection. The transfer of genes among distinct species might speed up the process, by providing proteins already adapted for the novel function. However, this hypothesis remains untested in multicellular eukaryotes. The grass Alloteropsis is an ideal system to test this hypothesis due to its diversity of genes encoding phosphoenolpyruvate carboxylase, an enzyme that catalyzes one of the key reactions in the C4 pathway. Different accessions of Alloteropsis either use native isoforms relatively recently co-opted from other functions or isoforms that were laterally acquired from distantly related species that evolved the C4 trait much earlier. By comparing the enzyme kinetics, we show that native isoforms with few amino acid replacements have substrate KM values similar to the non-C4 ancestral form, but exhibit marked increases in catalytic efficiency. The co-option of native isoforms was therefore followed by rapid catalytic improvements, which appear to rely on standing genetic variation observed within one species. Native C4 isoforms with more amino acid replacements exhibit additional changes in affinities, suggesting that the initial catalytic improvements are followed by gradual modifications. Finally, laterally acquired genes show both strong increases in catalytic efficiency and important changes in substrate handling. We conclude that the transfer of genes among distant species sharing the same physiological novelty creates an evolutionary shortcut toward more efficient enzymes, effectively accelerating evolution.

Model-Based Inference of Punctuated Molecular Evolution

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
In standard models of molecular evolution, DNA sequences evolve through asynchronous substitutions according to Poisson processes with a constant rate (called the molecular clock) or a rate that can vary (relaxed clock). However, DNA sequences can also undergo episodes of fast divergence that will appear as synchronous substitutions affecting several sites simultaneously at the macroevolutionary timescale. Here, we develop a model, which we call the Relaxed Clock with Spikes model, combining basal, clock-like molecular substitutions with episodes of fast divergence called spikes arising at speciation events. Given a multiple sequence alignment and its time-calibrated species phylogeny, our model is able to detect speciation events (including hidden ones) cooccurring with spike events and to estimate the probability and amplitude of these spikes on the phylogeny. We identify the conditions under which spikes can be distinguished from the natural variance of the clock-like component of molecular substitutions and from variations of the clock. We apply the method to genes underlying snake venom proteins and identify several spikes at gene-specific locations in the phylogeny. This work should pave the way for analyses relying on whole genomes to inform on modes of species diversification.

High and Highly Variable Spontaneous Mutation Rates in Daphnia

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
The rate and spectrum of spontaneous mutations are critical parameters in basic and applied biology because they dictate the pace and character of genetic variation introduced into populations, which is a prerequisite for evolution. We use a mutation–accumulation approach to estimate mutation parameters from whole-genome sequence data from multiple genotypes from multiple populations of Daphnia magna, an ecological and evolutionary model system. We report extremely high base substitution mutation rates (µ-n,bs = 8.96 × 10−9/bp/generation [95% CI: 6.66–11.97 × 10−9/bp/generation] in the nuclear genome and µ-m,bs = 8.7 × 10−7/bp/generation [95% CI: 4.40–15.12 × 10−7/bp/generation] in the mtDNA), the highest of any eukaryote examined using this approach. Levels of intraspecific variation based on the range of estimates from the nine genotypes collected from three populations (Finland, Germany, and Israel) span 1 and 3 orders of magnitude, respectively, resulting in up to a ∼300-fold difference in rates among genomic partitions within the same lineage. In contrast, mutation spectra exhibit very consistent patterns across genotypes and populations, suggesting the mechanisms underlying the mutational process may be similar, even when the rates at which they occur differ. We discuss the implications of high levels of intraspecific variation in rates, the importance of estimating gene conversion rates using a mutation–accumulation approach, and the interacting factors influencing the evolution of mutation parameters. Our findings deepen our knowledge about mutation and provide both challenges to and support for current theories aimed at explaining the evolution of the mutation rate, as a trait, across taxa.

Peptide–Oleate Complexes Create Novel Membrane-Bound Compartments

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
A challenging question in evolutionary theory is the origin of cell division and plausible molecular mechanisms involved. Here, we made the surprising observation that complexes formed by short alpha-helical peptides and oleic acid can create multiple membrane-enclosed spaces from a single lipid vesicle. The findings suggest that such complexes may contain the molecular information necessary to initiate and sustain this process. Based on these observations, we propose a new molecular model to understand protocell division.

Lateral Gene Transfer Acts As an Evolutionary Shortcut to Efficient C4 Biochemistry

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
The adaptation of proteins for novel functions often requires changes in their kinetics via amino acid replacement. This process can require multiple mutations, and therefore extended periods of selection. The transfer of genes among distinct species might speed up the process, by providing proteins already adapted for the novel function. However, this hypothesis remains untested in multicellular eukaryotes. The grass Alloteropsis is an ideal system to test this hypothesis due to its diversity of genes encoding phosphoenolpyruvate carboxylase, an enzyme that catalyzes one of the key reactions in the C4 pathway. Different accessions of Alloteropsis either use native isoforms relatively recently co-opted from other functions or isoforms that were laterally acquired from distantly related species that evolved the C4 trait much earlier. By comparing the enzyme kinetics, we show that native isoforms with few amino acid replacements have substrate KM values similar to the non-C4 ancestral form, but exhibit marked increases in catalytic efficiency. The co-option of native isoforms was therefore followed by rapid catalytic improvements, which appear to rely on standing genetic variation observed within one species. Native C4 isoforms with more amino acid replacements exhibit additional changes in affinities, suggesting that the initial catalytic improvements are followed by gradual modifications. Finally, laterally acquired genes show both strong increases in catalytic efficiency and important changes in substrate handling. We conclude that the transfer of genes among distant species sharing the same physiological novelty creates an evolutionary shortcut toward more efficient enzymes, effectively accelerating evolution.

High and Highly Variable Spontaneous Mutation Rates in Daphnia

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
The rate and spectrum of spontaneous mutations are critical parameters in basic and applied biology because they dictate the pace and character of genetic variation introduced into populations, which is a prerequisite for evolution. We use a mutation–accumulation approach to estimate mutation parameters from whole-genome sequence data from multiple genotypes from multiple populations of Daphnia magna, an ecological and evolutionary model system. We report extremely high base substitution mutation rates (µ-n,bs = 8.96 × 10−9/bp/generation [95% CI: 6.66–11.97 × 10−9/bp/generation] in the nuclear genome and µ-m,bs = 8.7 × 10−7/bp/generation [95% CI: 4.40–15.12 × 10−7/bp/generation] in the mtDNA), the highest of any eukaryote examined using this approach. Levels of intraspecific variation based on the range of estimates from the nine genotypes collected from three populations (Finland, Germany, and Israel) span 1 and 3 orders of magnitude, respectively, resulting in up to a ∼300-fold difference in rates among genomic partitions within the same lineage. In contrast, mutation spectra exhibit very consistent patterns across genotypes and populations, suggesting the mechanisms underlying the mutational process may be similar, even when the rates at which they occur differ. We discuss the implications of high levels of intraspecific variation in rates, the importance of estimating gene conversion rates using a mutation–accumulation approach, and the interacting factors influencing the evolution of mutation parameters. Our findings deepen our knowledge about mutation and provide both challenges to and support for current theories aimed at explaining the evolution of the mutation rate, as a trait, across taxa.

Peptide–Oleate Complexes Create Novel Membrane-Bound Compartments

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
A challenging question in evolutionary theory is the origin of cell division and plausible molecular mechanisms involved. Here, we made the surprising observation that complexes formed by short alpha-helical peptides and oleic acid can create multiple membrane-enclosed spaces from a single lipid vesicle. The findings suggest that such complexes may contain the molecular information necessary to initiate and sustain this process. Based on these observations, we propose a new molecular model to understand protocell division.

Lateral Gene Transfer Acts As an Evolutionary Shortcut to Efficient C4 Biochemistry

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
The adaptation of proteins for novel functions often requires changes in their kinetics via amino acid replacement. This process can require multiple mutations, and therefore extended periods of selection. The transfer of genes among distinct species might speed up the process, by providing proteins already adapted for the novel function. However, this hypothesis remains untested in multicellular eukaryotes. The grass Alloteropsis is an ideal system to test this hypothesis due to its diversity of genes encoding phosphoenolpyruvate carboxylase, an enzyme that catalyzes one of the key reactions in the C4 pathway. Different accessions of Alloteropsis either use native isoforms relatively recently co-opted from other functions or isoforms that were laterally acquired from distantly related species that evolved the C4 trait much earlier. By comparing the enzyme kinetics, we show that native isoforms with few amino acid replacements have substrate KM values similar to the non-C4 ancestral form, but exhibit marked increases in catalytic efficiency. The co-option of native isoforms was therefore followed by rapid catalytic improvements, which appear to rely on standing genetic variation observed within one species. Native C4 isoforms with more amino acid replacements exhibit additional changes in affinities, suggesting that the initial catalytic improvements are followed by gradual modifications. Finally, laterally acquired genes show both strong increases in catalytic efficiency and important changes in substrate handling. We conclude that the transfer of genes among distant species sharing the same physiological novelty creates an evolutionary shortcut toward more efficient enzymes, effectively accelerating evolution.

Peptide–Oleate Complexes Create Novel Membrane-Bound Compartments

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
A challenging question in evolutionary theory is the origin of cell division and plausible molecular mechanisms involved. Here, we made the surprising observation that complexes formed by short alpha-helical peptides and oleic acid can create multiple membrane-enclosed spaces from a single lipid vesicle. The findings suggest that such complexes may contain the molecular information necessary to initiate and sustain this process. Based on these observations, we propose a new molecular model to understand protocell division.

Lateral Gene Transfer Acts As an Evolutionary Shortcut to Efficient C4 Biochemistry

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
The adaptation of proteins for novel functions often requires changes in their kinetics via amino acid replacement. This process can require multiple mutations, and therefore extended periods of selection. The transfer of genes among distinct species might speed up the process, by providing proteins already adapted for the novel function. However, this hypothesis remains untested in multicellular eukaryotes. The grass Alloteropsis is an ideal system to test this hypothesis due to its diversity of genes encoding phosphoenolpyruvate carboxylase, an enzyme that catalyzes one of the key reactions in the C4 pathway. Different accessions of Alloteropsis either use native isoforms relatively recently co-opted from other functions or isoforms that were laterally acquired from distantly related species that evolved the C4 trait much earlier. By comparing the enzyme kinetics, we show that native isoforms with few amino acid replacements have substrate KM values similar to the non-C4 ancestral form, but exhibit marked increases in catalytic efficiency. The co-option of native isoforms was therefore followed by rapid catalytic improvements, which appear to rely on standing genetic variation observed within one species. Native C4 isoforms with more amino acid replacements exhibit additional changes in affinities, suggesting that the initial catalytic improvements are followed by gradual modifications. Finally, laterally acquired genes show both strong increases in catalytic efficiency and important changes in substrate handling. We conclude that the transfer of genes among distant species sharing the same physiological novelty creates an evolutionary shortcut toward more efficient enzymes, effectively accelerating evolution.

Lateral Gene Transfer Acts As an Evolutionary Shortcut to Efficient C4 Biochemistry

Wed, 10 Jun 2020 00:00:00 GMT

Abstract
The adaptation of proteins for novel functions often requires changes in their kinetics via amino acid replacement. This process can require multiple mutations, and therefore extended periods of selection. The transfer of genes among distinct species might speed up the process, by providing proteins already adapted for the novel function. However, this hypothesis remains untested in multicellular eukaryotes. The grass Alloteropsis is an ideal system to test this hypothesis due to its diversity of genes encoding phosphoenolpyruvate carboxylase, an enzyme that catalyzes one of the key reactions in the C4 pathway. Different accessions of Alloteropsis either use native isoforms relatively recently co-opted from other functions or isoforms that were laterally acquired from distantly related species that evolved the C4 trait much earlier. By comparing the enzyme kinetics, we show that native isoforms with few amino acid replacements have substrate KM values similar to the non-C4 ancestral form, but exhibit marked increases in catalytic efficiency. The co-option of native isoforms was therefore followed by rapid catalytic improvements, which appear to rely on standing genetic variation observed within one species. Native C4 isoforms with more amino acid replacements exhibit additional changes in affinities, suggesting that the initial catalytic improvements are followed by gradual modifications. Finally, laterally acquired genes show both strong increases in catalytic efficiency and important changes in substrate handling. We conclude that the transfer of genes among distant species sharing the same physiological novelty creates an evolutionary shortcut toward more efficient enzymes, effectively accelerating evolution.

GBE | Most Read

Genome Biology & Evolution

Highlight: Adaptations That Rule the Night

Fri, 16 Oct 2020 00:00:00 GMT

As the only birds with a nocturnal, predatory lifestyle, owls occupy a unique niche in the avian realm. Hunting prey in the dark comes with a number of challenges. Owls have evolved several features that leave them well suited to this task, combining raptorial traits like acute vision and sharp talons with nocturnal adaptations such as enhanced hearing and night vision. In a recent article in Genome Biology and Evolution titled “Genomic evidence for sensorial adaptations to a nocturnal predatory lifestyle in owls,” Pamela Espíndola-Hernández, a doctoral student at the Max Planck Institute for Ornithology, and colleagues report the results of a genome-wide scan to uncover the genetic and selective mechanisms that underlie the owls’ particular adaptations (Espíndola-Hernández et al. 2020). In addition to confirming the important role of the visual and auditory systems, the study, which was overseen by Dr Bart Kempenaers and Dr Jakob Mueller, in collaboration with Dr Martina Carrete at the Universidad Pablo de Olavide in Spain, suggests the existence of an unusual adaptation not yet described in birds, shedding new light on the evolutionary history of this nighttime predator. Specifically, the authors propose that selection has acted on epigenetic mechanisms to package the DNA in retinal cells in such a way that it acts as a light-channeling lens to enhance photoreception.

Genomic Evidence for Sensorial Adaptations to a Nocturnal Predatory Lifestyle in Owls

Sat, 08 Aug 2020 00:00:00 GMT

Abstract
Owls (Strigiformes) evolved specific adaptations to their nocturnal predatory lifestyle, such as asymmetrical ears, a facial disk, and a feather structure allowing silent flight. Owls also share some traits with diurnal raptors and other nocturnal birds, such as cryptic plumage patterns, reversed sexual size dimorphism, and acute vision and hearing. The genetic basis of some of these adaptations to a nocturnal predatory lifestyle has been studied by candidate gene approaches but rarely with genome-wide scans. Here, we used a genome-wide comparative analysis to test for selection in the early history of the owls. We estimated the substitution rates in the coding regions of 20 bird genomes, including 11 owls of which five were newly sequenced. Then, we tested for functional overrepresentation across the genes that showed signals of selection. In the ancestral branch of the owls, we found traces of positive selection in the evolution of genes functionally related to visual perception, especially to phototransduction, and to chromosome packaging. Several genes that have been previously linked to acoustic perception, circadian rhythm, and feather structure also showed signals of an accelerated evolution in the origin of the owls. We discuss the functions of the genes under positive selection and their putative association with the adaptation to the nocturnal predatory lifestyle of the owls.

First Complete Genome Sequences of Janthinobacterium lividum EIF1 and EIF2 and Their Comparative Genome Analysis

Mon, 13 Jul 2020 00:00:00 GMT

Abstract
We present the first two complete genomes of the Janthinobacterium lividum species, namely strains EIF1 and EIF2, which both possess the ability to synthesize violacein. The violet pigment violacein is a secondary metabolite with antibacterial, antifungal, antiviral, and antitumoral properties. Both strains were isolated from environmental oligotrophic water ponds in Göttingen. The strains were phylogenetically classified by average nucleotide identity (ANI) analysis and showed a species assignment to J. lividum with 97.72% (EIF1) and 97.66% (EIF2) identity. These are the first complete genome sequences of strains belonging to the species J. lividum. The genome of strain EIF1 consists of one circular chromosome (6,373,589 bp) with a GC-content of 61.98%. The genome contains 5,551 coding sequences, 122 rRNAs, 93 tRNAs, and 1 tm-RNA. The genome of EIF2 comprises one circular chromosome (6,399,352 bp) with a GC-content of 61.63% and a circular plasmid p356839 (356,839 bp) with a GC-content of 57.21%. The chromosome encodes 5,691 coding sequences, 122 rRNAs, 93 tRNAs, and 1 tm-RNA and the plasmid harbors 245 coding sequences. In addition to the highly conserved chromosomally encoded violacein operon, the plasmid comprises a nonribosomal peptide synthetase cluster with similarity to xenoamicin, which is a bioactive compound effective against protozoan parasites.

Two Lineages of Pseudomonas aeruginosa Filamentous Phages: Structural Uniformity over Integration Preferences

Mon, 13 Jul 2020 00:00:00 GMT

Abstract
Pseudomonas aeruginosa filamentous (Pf) bacteriophages are important factors contributing to the pathogenicity of this opportunistic bacterium, including biofilm formation and suppression of bacterial phagocytosis by macrophages. In addition, the capacity of Pf phages to form liquid crystal structures and their high negative charge density makes them potent sequesters of cationic antibacterial agents, such as aminoglycoside antibiotics or host antimicrobial peptides. Therefore, Pf phages have been proposed as a potential biomarker for risk of antibiotic resistance development. The majority of studies describing biological functions of Pf viruses have been performed with only three of them: Pf1, Pf4, and Pf5. However, our analysis revealed that Pf phages exist as two evolutionary lineages (I and II), characterized by substantially different structural/morphogenesis properties, despite sharing the same integration sites in the host chromosomes. All aforementioned model Pf phages are members of the lineage I. Hence, it is reasonable to speculate that their interactions with P. aeruginosa and impact on its pathogenicity may be not completely extrapolated to the lineage II members. Furthermore, in order to organize the present numerical nomenclature of Pf phages, we propose a more informative approach based on the insertion sites, that is, Pf-tRNA-Gly, -Met, -Sec, -tmRNA, and -DR (direct repeats), which are fully compatible with one of five types of tyrosine integrases/recombinases XerC/D carried by these viruses. Finally, we discuss possible evolutionary mechanisms behind this division and consequences from the perspective of virus–virus, virus–bacterium, and virus–human interactions.

The Genome of the Softshell Clam Mya arenaria and the Evolution of Apoptosis

Sat, 11 Jul 2020 00:00:00 GMT

Abstract
Apoptosis is a fundamental feature of multicellular animals and is best understood in mammals, flies, and nematodes, with the invertebrate models being thought to represent a condition of ancestral simplicity. However, the existence of a leukemia-like cancer in the softshell clam Mya arenaria provides an opportunity to re-evaluate the evolution of the genetic machinery of apoptosis. Here, we report the whole-genome sequence for M. arenaria which we leverage with existing data to test evolutionary hypotheses on the origins of apoptosis in animals. We show that the ancestral bilaterian p53 locus, a master regulator of apoptosis, possessed a complex domain structure, in contrast to that of extant ecdysozoan p53s. Further, ecdysozoan taxa, but not chordates or lophotrochozoans like M. arenaria, show a widespread reduction in apoptosis gene copy number. Finally, phylogenetic exploration of apoptosis gene copy number reveals a striking linkage with p53 domain complexity across species. Our results challenge the current understanding of the evolution of apoptosis and highlight the ancestral complexity of the bilaterian apoptotic tool kit and its subsequent dismantlement during the ecdysozoan radiation.

Ultrastructural, Cytochemical, and Comparative Genomic Evidence of Peroxisomes in Three Genera of Pathogenic Free-Living Amoebae, Including the First Morphological Data for the Presence of This Organelle in Heteroloboseans

Tue, 30 Jun 2020 00:00:00 GMT

Abstract
Peroxisomes perform various metabolic processes that are primarily related to the elimination of reactive oxygen species and oxidative lipid metabolism. These organelles are present in all major eukaryotic lineages, nevertheless, information regarding the presence of peroxisomes in opportunistic parasitic protozoa is scarce and in many cases it is still unknown whether these organisms have peroxisomes at all. Here, we performed ultrastructural, cytochemical, and bioinformatic studies to investigate the presence of peroxisomes in three genera of free-living amoebae from two different taxonomic groups that are known to cause fatal infections in humans. By transmission electron microscopy, round structures with a granular content limited by a single membrane were observed in Acanthamoeba castellanii, Acanthamoeba griffini, Acanthamoeba polyphaga, Acanthamoeba royreba, Balamuthia mandrillaris (Amoebozoa), and Naegleria fowleri (Heterolobosea). Further confirmation for the presence of peroxisomes was obtained by treating trophozoites in situ with diaminobenzidine and hydrogen peroxide, which showed positive reaction products for the presence of catalase. We then performed comparative genomic analyses to identify predicted peroxin homologues in these organisms. Our results demonstrate that a complete set of peroxins—which are essential for peroxisome biogenesis, proliferation, and protein import—are present in all of these amoebae. Likewise, our in silico analyses allowed us to identify a complete set of peroxins in Naegleria lovaniensis and three novel peroxin homologues in Naegleria gruberi. Thus, our results indicate that peroxisomes are present in these three genera of free-living amoebae and that they have a similar peroxin complement despite belonging to different evolutionary lineages.

Evolutionary History of the Globin Gene Family in Annelids

Mon, 29 Jun 2020 00:00:00 GMT

Abstract
Animals depend on the sequential oxidation of organic molecules to survive; thus, oxygen-carrying/transporting proteins play a fundamental role in aerobic metabolism. Globins are the most common and widespread group of respiratory proteins. They can be divided into three types: circulating intracellular, noncirculating intracellular, and extracellular, all of which have been reported in annelids. The diversity of oxygen transport proteins has been underestimated across metazoans. We probed 250 annelid transcriptomes in search of globin diversity in order to elucidate the evolutionary history of this gene family within this phylum. We report two new globin types in annelids, namely androglobins and cytoglobins. Although cytoglobins and myoglobins from vertebrates and from invertebrates are referred to by the same name, our data show they are not genuine orthologs. Our phylogenetic analyses show that extracellular globins from annelids are more closely related to extracellular globins from other metazoans than to the intracellular globins of annelids. Broadly, our findings indicate that multiple gene duplication and neo-functionalization events shaped the evolutionary history of the globin family.