Cheat sheet: Species occurrence data

Description

Biodiversity data is data about species recorded in space and time. Such data is useful for the modeling of future species distribution, impact of climate change etc. Many natural studies inadvertently collect biodiversity data in different shape and form therefore it is often challenging to reshape the data into standardised form.

Type of data/experiments/methods

All publishable biodiversity data should follow Darwin Core data standard. Data in spreadsheets or databases can easily be converted to the Darwin Core standard using the IPT (Integrated Publishing Toolkit). This standard supports the following data structures:

Metadata Standards

Ecological Metadata Language (EML)

Metadata standard used is EML. To create metadata for your dataset, you will fill in a form during the IPT dataset publication process.

Ontologies

EML accepts various vocabularies, some examples include:

Sources for Reusable Data

GBIF

All published biodiversity data, from different sources, is available through GBIF portal.
Data user guidelines
Identifiers:
- Filtered datasets are provided with unique DOIs for tracking data use.
It is also possible to use the GBIF API with R or Python to retrieve data.

Storage and Computing

Storage is provided by individual Integrated Publishing Toolkit (IPT) providers.

Data Deposition Repository

File formats that are supported by IPT are:

Plain text formats
- Tab-separated values (TSV)
- Comma-separated values (CSV)
- Open formats
XLSX and XLS
SQL databases (e.g. MariaDB, PostgreSQL)

GBIF Norway IPT

GBIF Norway’s IPT Homepage
Contact GBIF to get user credentials
Post on the GBIF Norway GitHub issue list for general questions about data publication. You can also see questions others have posted here.