MolluscDB: an integrated functional and evolutionary genomics database for the hyper-diverse animal phylum Mollusca
On October 23, 2020, the team of Academician Bao Zhenmin from MoE Key Laboratory of Marine Genetics and Breeding and Sars-Fang Centre published online the first international comprehensive mollusc genome database in the top international journal Nucleic Acids Research Mollusc: an integrated functional and evolutionary genomics database for the hyper-diverse animal phylum Mollusca.
Mollusca, commonly known as shellfish, is the second largest phylum in the animal kingdom, with over 100 000 extant species. It also represents the largest marine phylum, containing ∼23% of all named marine organisms. Molluscs are globally distributed and play vital roles in the structure and functioning of marine, freshwater and terrestrial ecosystems. They are among the first bilaterians to appear in fossil records and mark the extraordinary Cambrian explosion of animals ∼540 million years ago. With tremendous diversity in morphologies, behaviours and lifestyles, they have survived several mass extinction events, which makes them well known as one of the most ancient and evolutionarily successful groups of invertebrates. Molluscs exhibit fascinating biological and evolutionary innovations, including a diversity of body plans and highly specialized structures (e.g. bivalve shells for defence and cephalopod arms for predation), adaptive life-history characters (e.g. up to 507 years life span for the bivalve Arctica islandica and extraordinary developmental flexibility (e.g. up to a 4.4-year egg-brooding period for the deep-sea octopus Graneledone boreopacifica). Molluscs have been employed as excellent models for over 100 years in studies of developmental and cell biology, neurobiology, physiology, behaviour, evolution, population genetics and materials science. Moreover, many molluscs are important fishery and aquaculture species, accounting for ∼22% of the total world aquaculture production. They therefore present an important source of food throughout the world and provide significant economic benefits to humans.
Despite their remarkable biological, evolutionary and ecological significance, molluscs have long been neglected from a genomic perspective. The rapid development of high-throughput sequencing technologies has pushed molluscan research into the genomics era. Decoding several molluscan genomes and transcriptomes has led to several major discoveries or breakthroughs, including heat shock protein and immune-related gene expansion for stressful intertidal zone and deep-sea adaptation, near-perfect preservation of bilaterian ancestor-like karyotypes, neural novelty evolution by extensive RNA editing, a single intercalation origin of metazoan larvae, and a deeply resolved molluscan phylogeny. While current molluscan genomic/transcriptomic resources have been accumulated and are rapidly increasing, the access and utilization of these scattered genomic resources pose a great challenge for the molluscan research community. There is an urgent need to establish a Mollusca genomics platform or database by integrating extensive genomic resources and developing convenient tools for comprehensive analysis of these data.
Towards this goal, Shi's group constructed the first comprehensive genomics database specifically for molluscs (named MolluscDB, http://mgbase.qnlm.ac) by integrating current molluscan genomic/transcriptomic resources and providing convenient tools for multi-level integrative and comparative analyses. MolluscDB enables a systematic view of genomic and transcriptomic information from various aspects and provides highly valuable, unique custom datasets or resources that are not available elsewhere. The database is compatible with computers, tablets, and mobile devices, and all data in MolluscDB can be freely accessed and downloaded.
OVERVIEW OF DATABASE STRUCTURE AND FUNCTION
MolluscDB represents the most comprehensive collection of 558 molluscan genomic/transcriptomic datasets (including 20 high-quality assembled genomes, 314 reference genome-profiled transcriptomes and 224 de novo-profiled transcriptomes) and 409 mitochondrial genomic resources (Figure 1, Table 1). These resources show outstandingly high taxonomy coverage of all the seven classes and ∼87% of the total 53 orders (according to NCBI Taxonomy Database) in Mollusca. MolluscDB provides various genomic information, including genome assembly statistics, a genome phylogeny, fossil records, gene sequence, structure, functional annotations, expressional profiles, gene families, transcription factors and transposable elements. Convenient visualization of genomic information is compiled and integrated into a customized genome browser. MolluscDB also offers highly valuable, special-featured customized datasets or resources, including gene coexpression networks across various developmental stages and adult tissues/organs, the core gene repertoires inferred for Mollusca and descendent ancestors, and genome-by-genome macrosynteny analysis for inferring molluscan karyotype evolution. Moreover, MolluscDB provides useful and convenient tools for user-defined search of genes of interest, blast- and blat-based sequence comparison and PCR primer design. MolluscDB is implemented with the Linux operating system, using J2EE as the framework, MySQL as the back-end database and Apache Tomcat as the server. Web user interfaces were developed based on JavaServer Pages (JSP), HTML5 and CSS3.
Overview of MolluscDB database structure and web interface features
Professor Wang Shi from the MoE Key Laboratory of Marine Genetics and Breeding and Sars-Fang Centre is the corresponding author of this article, Associate Professor Li Yuli is the co-corresponding author, and PhD student Liu Fuyun is the first author. The research work was funded by National Key R&D Program of China, the National Natural Science Foundation of China, and the Taishan Scholars of Shandong Province. This work was also strongly supported by the high-performance scientific computing and system simulation platform of Pilot National Laboratory for Marine Science and Technology(Qingdao).
The paper link: https://academic.oup.com/nar/advance-article-abstract/doi/10.1093/nar/gkaa918/5936037