Description
Collection of resources, tools and standards relevant for those interested in analysing marine (meta-) genomic datasets (e.g. genomes, metagenomes, and transcriptomes).
Type of data/experiments/methods
File formats
Commonly used raw file formats for sequencing data
- FASTQ Sequence and Sequence Quality Format
- FASTA
- FASTQ Original Read Archive (ORA)
- Illumina Binary Base Call
- PacBio legacy basecall File Format (bas.h5/bax.h5)
- PacBio Alignment File Format (cmp.h5)
- POD5 File Format for Oxford Nanopore Technology (ONT) data
- Fast5 for ONT data
Alignment Formats
Sequence Alignment Map (SAM) - FAIRsharing - Open Format
Binary Alignment Map Format (BAM) - FAIRsharing - Open Format
Compressed Reference-oriented Alignment Map (CRAM) - FAIRsharing - Open Format
Annotation formats
GenBank Sequence Format (GB, GBK) - FAIRsharing - Open Format
ENA Sequence Flat File Format (formerly EMBL Sequence Flat File Format) - FAIRsharing - Open format
Browser Extensible Data Format (BED) - FAIRsharing - Open format
Generic Feature Format Version 3 (GFF3) - FAIRsharing - Open format
Gene Transfer Format (GTF) - FAIRsharing - Open format
Variant Call Format (VCF) - FAIRsharing - Open format
Metadata Standards
Standards
- Minimum Information about any (x) Sequence
The minimum information about any (x) sequence (MIxS) is an overarching framework of sequence metadata -
Minimum Information about a (Meta)Genome Sequence (MIxS - MIGS/MIMS)
- Meta-omics Data and Collection Objects (MOD-CO)
- Genomic Contextual Data Markup Language (GCDML)
- Minimum Information about an Uncultivated Virus Genome (MIUViG)
- Minimum Information Standard for Engineered Organism Experiments (MIEO)
- Marine Microbial Biodiversity, Bioinformatics, Biotechnology Checklist (Micro B3) - FAIRsharing
Ontologies
Sources for Reusable Data
MGnify
- Description: EBI Metagenomics has changed its name to MGnify to reflect a change in scope. This is a free-to-use resource aiming at supporting all metagenomics researchers. The service is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. You can freely browse all the public data in the repository.
- Standard License: EMBL-EBI Terms of Use
- identifiers.org: MGnify Sample, MGnify Project
- How to access:
MAR databases
- Description: The MAR database is a collection of manually curated marine microbial contextual and sequence databases, based at the Marine Metagenomics Portal. This was developed as a part of the ELIXIR EXCELERATE project in 2017 and is maintained by The Center for Bioinformatics (SfB) at the UiT The Arctic University of Norway. SfB is hosting the UiT node of ELIXIR Norway. The MarRef, MarDb, MarFun and MarCat contextual databases are built by compiling data from a number of public available sequence, taxonomy and literature databases in a semi-automatic fashion.
- identifiers.org: MarRef, MarDB, MarFun, MarCat
- How to access:
IMG/VR
(Integrated Microbial Genomes and Microbiomes - Viral Resources)
- Description: Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. The 3rd version of IMG/VR (Sept 2020) is composed of 18,373 cultivated and 2,314,329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. UViGs in IMG/VR are reported as single viral contigs, integrated proviruses, or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity.
- identifiers.org: Integrated Microbial Genomes Taxon, Integrated Microbial Genomes Gene
- How to access:
Storage and Computing
Data Deposition Repositories
European Nucleotide Archive (ENA)
- Homepage
- DOI(FAIRsharing)
- License: refer to the Policies page
- Identifiers: Accession numbers
- How to submit data:
- General guide on data submission
- ENA checklists (i.e. supported metadata standards required for submission)
- Embargo: possible, set status to confidential upon submission
- More general RDM information about ENA on RDMguide (ELIXIR Belgium)
World Register of Marine Species (WoRMS)
- Standard License: CC BY 4.0
- identifiers.org
- How to submit data: Through WoRMS contributing databases
Ethics and Regulations
- General guidance for research ethics
- Guidelines for Research Ethics in Science and Technology
- Institutional guidelines
Services in Norway
The Marine Metagenomics Portal (MMP)
- The Marine Metagenomics Portal provides access to high-quality curated and freely accessible marine microbial genomics and metagenomics resources. It includes the MAR databases, a collection of richly annotated and manually curated contextual and sequence databases, and MetaPipe, a complete workflow for the analysis of marine metagenomic data.
- Contact information
Metapipe
- META-pipe is a complete workflow for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling.
- Contact information
Data management planning:
The ELIXIR-NO instance of the Data Stewardship Wizard provides support for data management planning for marine metagenomics in Norway. An exemplary Data Management Plan model for marine metagenomics in Norway is available here. General guidance from RDMkit on how to write a Data Management Plan
Data storage
- The Norwegian e-infrastructure for life sciences (NeLS)
- ELIXIR Norway offers an infrastructure for storage of scientific data, intended for scientific research projects with larger sets of data (minimum 1TB) for mid-term storage. We currently offer free storage of data up to 10TB. For larger projects, please contact us at .
- Contact information
Bioinformatics
- ELIXIR Norway offers general advice and experimental design consultancy, programming and scripting assistance and support for data analysis on marine metagenomics in Norway. ELIXIR Norway’s HelpDesk can also assist you on:
- Data management planning
- Storage and computing
- Metadata standards
- Data deposition to ELIXIR databases
Useful Links
Norwegian tool assembly for marine metagenomics data management