Description
We provide here a collection of resources, tools, and standards relevant for short-read and long-read sequencing data.
Type of data/experiments/methods
Commonly used raw file formats for sequencing data
Most of these raw file formats are usually converted to BAM (and FASTQ file formats) for data processing and bioinformatics analysis.
- FASTA
- FASTQ Sequence and Sequence Quality Format
- FASTQ Original Read Archive (ORA)
- Illumina Binary Base Call
- PacBio platforms generate PacBio legacy basecall File Format (bas.h5/bax.h5) and PacBio Alignment File Format (cmp.h5)
- Oxford Nanopore Technology (ONT) platforms generate POD5 File Format and Fast5 for nanopore data
Alignment Formats
Sequence Alignment Map (SAM) - FAIRsharing - Open Format
Binary Alignment Map Format (BAM) - FAIRsharing - Open Format
Compressed Reference-oriented Alignment Map (CRAM) - FAIRsharing - Open Format
Annotation formats
GenBank Sequence Format (GB, GBK) - FAIRsharing - Open Format
ENA Sequence Flat File Format (formerly EMBL Sequence Flat File Format - FAIRsharing - Open format
Browser Extensible Data Format (BED) - FAIRsharing - Open format
Generic Feature Format Version 3 (GFF3) - FAIRsharing - Open format
Gene Transfer Format (GTF) - FAIRsharing - Open format
Variant Call Format (VCF) - FAIRsharing - Open format
Metadata Standards
The Minimum Information about any (x) Sequence (MIxS) is an overarching framework of sequence metadata.
- Ontologies:
- Controlled vocabularies:
European Variation Archive metadata template
This standard is required for submission of genetic variant data on EVA
Sources for Reusable Data
Ensembl
- Ensembl creates, integrates, and distributes reference datasets and analysis tools that enable genomics.
- Data is open-access and can be downloaded free of charge (disclaimer)
- Identifiers: Ensembl stable ID
- Free to access, more features with account (username and password)
Storage and Computing
ELIXIR Norway infrastructures
- The Norwegian e-infrastructure for Life Sciences (NeLS)
- Free of charge allocation of 1–10 TB disk space
- Storing active research data for analysing and processing
- Granular data sharing with collaborators
- National instance of Galaxy
- Provides ~2000 tools data processing and analyses
- Create, customise and reuse workflows
- Data stored in NeLS is readily available for processing on Galaxy
- Data redundancy is avoided
- StoreBioInfo
- Access through the NeLS portal
- Long term (until the end of a project) storage of non-active data
- Store up to 10 GB of data free of charge
If your data is produced by the NorSeq core facilities, direct upload to the ELIXIR Norway storage infrastructures is possible. Non-sensitive data will be uploaded on NeLS following these procedures. If your data has been produced by another sequence provider, follow these instructions to request a project on NeLS.
Sigma2 (Sikt) infrastructures
- High-performance computing
- Overview of available machines
- NIRD data storage
- Storage of active data for processing and analysis
- Granular data sharing with collaborators
- NIRD Service Platform
- Run cloud services including tools for processing and visualisation.
- The services can be used to consume data without moving it from the NIRD storage location.
Data Deposition Repository
European Nucleotide Archive (ENA)
- Homepage
- DOI(FAIRsharing)
- License: refer to the Policies page
- Identifiers: Accession numbers
- How to submit data:
- General guide on data submission
- ENA checklists (i.e. supported metadata standards required for submission)
- Embargo: possible, set status to confidential upon submission
- More general RDM information about ENA on RDMguide (ELIXIR Belgium)
European Variation Archive (EVA)
- Homepage
- DOI (FAIRsharing)
- License: EMBL-EBI terms of use
- Identifiers: accession numbers
- Submit data
- Embargo: Data submitted to the EVA can be held privately for up to two years. The date of publication is set by the submitter using the “Hold Date” field of the EVA metadata template (see the help page).
Ethics and Regulations in Norway
- General guidance for research ethics
- Guidelines for Research Ethics in Science and Technology
- Institutional guidelines
Services in Norway
RDM Services
ELIXIR Norway’s HelpDesk can assist you on:
- Data management planning
- Storage and computing
- Metadata standards
- Data deposition to ELIXIR databases
- Data analysis for various sequencing methods
Scientific Services
- The National Consortium for Sequencing and Personalized Medicine (NorSeq)
- Equipment and Services
- Contact: get in touch directly with the NorSeq site that best suits your requirements:
- NorSeq-Oslo, Oslo University Hospital and University of Oslo
- NorSeq-Cancer, Oslo University Hospital
- NorSeq-Bergen, University of Bergen and Haukeland University Hospital
- NorSeq-Trondheim, Norwegian University of Science and Technology (NTNU) and St.Olav’s Hospital
- NorSeq-Tromsø, UiT - The Arctic University of Norway and University Hospital of Northern Norway