Skip to content Skip to footer

Cheat sheet: High-Throughput Sequencing

Description

We provide here a collection of resources, tools, and standards relevant for short-read and long-read sequencing data.

Type of data/experiments/methods

Commonly used raw file formats for sequencing data

Most of these raw file formats are usually converted to BAM (and FASTQ file formats) for data processing and bioinformatics analysis.

Alignment Formats

Sequence Alignment Map (SAM) - FAIRsharing - Open Format

Binary Alignment Map Format (BAM) - FAIRsharing - Open Format

Compressed Reference-oriented Alignment Map (CRAM) - FAIRsharing - Open Format

Annotation formats

GenBank Sequence Format (GB, GBK) - FAIRsharing - Open Format

ENA Sequence Flat File Format (formerly EMBL Sequence Flat File Format - FAIRsharing - Open format

Browser Extensible Data Format (BED) - FAIRsharing - Open format

Generic Feature Format Version 3 (GFF3) - FAIRsharing - Open format

Gene Transfer Format (GTF) - FAIRsharing - Open format

Variant Call Format (VCF) - FAIRsharing - Open format

Metadata Standards

The Minimum Information about any (x) Sequence (MIxS) is an overarching framework of sequence metadata.

European Variation Archive metadata template

This standard is required for submission of genetic variant data on EVA

Sources for Reusable Data

Ensembl

  • Ensembl creates, integrates, and distributes reference datasets and analysis tools that enable genomics.
  • Data is open-access and can be downloaded free of charge (disclaimer)
  • Identifiers: Ensembl stable ID
  • Free to access, more features with account (username and password)

Storage and Computing

ELIXIR Norway infrastructures

  • The Norwegian e-infrastructure for Life Sciences (NeLS)
    • Free of charge allocation of 1–10 TB disk space
    • Storing active research data for analysing and processing
    • Granular data sharing with collaborators
  • National instance of Galaxy
    • Provides ~2000 tools data processing and analyses
    • Create, customise and reuse workflows
    • Data stored in NeLS is readily available for processing on Galaxy
      • Data redundancy is avoided
  • StoreBioInfo
    • Access through the NeLS portal
    • Long term (until the end of a project) storage of non-active data
    • Store up to 10 GB of data free of charge

If your data is produced by the NorSeq core facilities, direct upload to the ELIXIR Norway storage infrastructures is possible. Non-sensitive data will be uploaded on NeLS following these procedures. If your data has been produced by another sequence provider, follow these instructions to request a project on NeLS.

Sigma2 (Sikt) infrastructures

Data Deposition Repository

European Nucleotide Archive (ENA)

European Variation Archive (EVA)

Ethics and Regulations in Norway

Services in Norway

RDM Services

ELIXIR Norway’s HelpDesk can assist you on:

  • Data management planning
  • Storage and computing
  • Metadata standards
  • Data deposition to ELIXIR databases
  • Data analysis for various sequencing methods

Scientific Services

Related pages

More information

Affiliations Contributors