Skip to content Skip to footer

Cheat sheet: Marine Metagenomics

Description

Collection of resources, tools and standards relevant for those interested in analysing marine (meta-) genomic datasets (e.g. genomes, metagenomes, and transcriptomes).

Type of data/experiments/methods

File formats

Commonly used raw file formats for sequencing data

Alignment Formats

Sequence Alignment Map (SAM) - FAIRsharing - Open Format

Binary Alignment Map Format (BAM) - FAIRsharing - Open Format

Compressed Reference-oriented Alignment Map (CRAM) - FAIRsharing - Open Format

Annotation formats

GenBank Sequence Format (GB, GBK) - FAIRsharing - Open Format

ENA Sequence Flat File Format (formerly EMBL Sequence Flat File Format) - FAIRsharing - Open format

Browser Extensible Data Format (BED) - FAIRsharing - Open format

Generic Feature Format Version 3 (GFF3) - FAIRsharing - Open format

Gene Transfer Format (GTF) - FAIRsharing - Open format

Variant Call Format (VCF) - FAIRsharing - Open format

Metadata Standards

Standards

Ontologies

Sources for Reusable Data

MGnify

  • Description: EBI Metagenomics has changed its name to MGnify to reflect a change in scope. This is a free-to-use resource aiming at supporting all metagenomics researchers. The service is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. You can freely browse all the public data in the repository.
  • Standard License: EMBL-EBI Terms of Use
  • identifiers.org: MGnify Sample, MGnify Project
  • How to access:

MAR databases

  • Description: The MAR database is a collection of manually curated marine microbial contextual and sequence databases, based at the Marine Metagenomics Portal. This was developed as a part of the ELIXIR EXCELERATE project in 2017 and is maintained by The Center for Bioinformatics (SfB) at the UiT The Arctic University of Norway. SfB is hosting the UiT node of ELIXIR Norway. The MarRef, MarDb, MarFun and MarCat contextual databases are built by compiling data from a number of public available sequence, taxonomy and literature databases in a semi-automatic fashion.
  • identifiers.org: MarRef, MarDB, MarFun, MarCat
  • How to access:

IMG/VR

(Integrated Microbial Genomes and Microbiomes - Viral Resources)

  • Description: Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. The 3rd version of IMG/VR (Sept 2020) is composed of 18,373 cultivated and 2,314,329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. UViGs in IMG/VR are reported as single viral contigs, integrated proviruses, or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity.
  • identifiers.org: Integrated Microbial Genomes Taxon, Integrated Microbial Genomes Gene
  • How to access:

Storage and Computing

Data Deposition Repositories

European Nucleotide Archive (ENA)

World Register of Marine Species (WoRMS)

Ethics and Regulations

Services in Norway

The Marine Metagenomics Portal (MMP)

  • The Marine Metagenomics Portal provides access to high-quality curated and freely accessible marine microbial genomics and metagenomics resources. It includes the MAR databases, a collection of richly annotated and manually curated contextual and sequence databases, and MetaPipe, a complete workflow for the analysis of marine metagenomic data.
  • Contact information

Metapipe

  • META-pipe is a complete workflow for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling.
  • Contact information

Data management planning:

The ELIXIR-NO instance of the Data Stewardship Wizard provides support for data management planning for marine metagenomics in Norway. An exemplary Data Management Plan model for marine metagenomics in Norway is available here. General guidance from RDMkit on how to write a Data Management Plan

Data storage

  • The Norwegian e-infrastructure for life sciences (NeLS)
  • ELIXIR Norway offers an infrastructure for storage of scientific data, intended for scientific research projects with larger sets of data (minimum 1TB) for mid-term storage. We currently offer free storage of data up to 10TB. For larger projects, please contact us at .
  • Contact information

Bioinformatics

  • ELIXIR Norway offers general advice and experimental design consultancy, programming and scripting assistance and support for data analysis on marine metagenomics in Norway. ELIXIR Norway’s HelpDesk can also assist you on:
    • Data management planning
    • Storage and computing
    • Metadata standards
    • Data deposition to ELIXIR databases

Norwegian tool assembly for marine metagenomics data management

Related pages

More information

Links to other ELIXIR resources
Contributors