Skip to content Skip to footer

Cheat sheet: Species occurrence data

Description

Biodiversity data is data about species recorded in space and time. Such data is useful for the modeling of future species distribution, impact of climate change etc. Many natural studies inadvertently collect biodiversity data in different shape and form therefore it is often challenging to reshape the data into standardised form.

Type of data/experiments/methods

All publishable biodiversity data should follow Darwin Core data standard. Data in spreadsheets or databases can easily be converted to the Darwin Core standard using the IPT (Integrated Publishing Toolkit). This standard supports the following data structures:

Occurrence data

Sampling event data

Checklist data

Metadata Standards

Ecological Metadata Language (EML)

Metadata standard used is EML. To create metadata for your dataset, you will fill in a form during the IPT dataset publication process.

Ontologies

EML accepts various vocabularies, some examples include:

Sources for Reusable Data

GBIF

  • All published biodiversity data, from different sources, is available through GBIF portal.
  • Data user guidelines
  • Identifiers:
    • Filtered datasets are provided with unique DOIs for tracking data use.
  • It is also possible to use the GBIF API with R or Python to retrieve data.

Storage and Computing

Storage is provided by individual Integrated Publishing Toolkit (IPT) providers.

Data Deposition Repository

File formats that are supported by IPT are:

GBIF Norway IPT

More information

Links to other ELIXIR resources
Affiliations Contributors