Software features

Here is a list of the features available in the metadata catalogue

Metadata module

Automatic ingestion of DWCA datasets published on IPT
CSW support to explore/query datasets metadata
Metadata exposed in ISO 19139

Dataset module

Current:

Expose vector datasets with OGC API - Features (via PyGEOAPI)
Different GDAL providers supported (gpkg, shp, parquet, postgresql, csv)
Support DWCA dataset
Support on-the-fly operation on vectors

Not supported:

Other OGC API standards (Coverage, Maps, Tiles, Processes, Records, Environmental Data Retrieval, STAC)

Maps module

NOTE: The software is intended as a solution for displaying datasets on the web using cloud-optimized formats that don't require GIS servers.

Current:

Display vector datasets (via PMTiles)
Display raster datasets (via Cloud-Optimized GeoTIFF)
Organize layers in hierarchy/groups
Layer legends
Download of the original dataset
Style rendering of raster (via TiTiler)
Style rendering of vectors (via Maplibre JS)
Zoom to bounding box
Description of each layer
Description of each group
Description of the map
Custom logo, title
Basemaps
APIs for create, edit the map
Limited UI for simple edit
Basic info popup

Not supported:

Query/filter datasets features
Dynamic style change
Dynamic datasets that change frequently
Clusters
Analysis

It's possible to (re)use the Maps module as a base and to customize it or to develop features based for projects with different requirements.

Catalogue Architecture

NINA catalogue uses Django and PostgreSQL.

The software is made of different parts:

Django, is the application server, it handles all the logic of answering to browser requests
Queue, provides long running tasks, scheduled tasks and asyncronous executions
PostgreSQL, is the database server, it stores the data
NGINX, is the webserver, it shares user-uploaded files and static assets
Varnish, provides a caching layer

Features inside the catalogue are organized as modules, each module aims to be as much independent as possible:

Datasets

The Catalogue allows to define Datasets, each dataset is a set of data in a specific format with a set of metadata. The Catalogue aims to support different type of Dataset from different sources, right now the following are supported:

Dataset Types:
- DarwinCORE Archives
Sources:
- IPT server

IMPORTANT: it's necessary that sources services disable streaming responses, GDAL needs the ´Content-length´ header to be present. This can be achived using Varnish.

Import

Datasets are retrived from sources using harvesters. Harvesters are python functions that can extract data from a source type and populate the database with the corresponding dataset and metadata.

Harvester functions should be provided as django commands, a set of cli commands that can be executed with python manage.py .... Implemented:

fetch_ipt http://my-ipt-server.com

Metadata

Along with the dataset a set of metadata is stored in the database in a normalized way.

Services and Protocols

To explore and navigate the datasets two services are provided:

PyCSW (csw protocol)
PyGeoAPI (OGC API)

PyCSW

The metadata stored in the database are converted an XML in the ISO 19139 format using pygeometa. PyCSW is integrated in Django using a custom mapping, so that the metadata are read using the Django ORM instead of SQL.

NOTE: this implies that complex queries may not work as expected.

PyGeoAPI

Datasets are shared through PyGeoAPI using GDAL as resource provider: this is implemented using GDAL VRT for vectors.

NOTE: raster support is missing

Each dataset should provide a valid vrt definition to open the file, this allows for example to serve CSV files as spatial datasets.

This diagram explains the flow that a dataset request follow:

DarwinCORE Archives

Darwincore archives are zip files that contains certain files:

eml.xml, contains the metadata
meta.xml, contains info about all the other files inside the zip

This page explains the code in datasets/libs/darwincore.

Meta XML have a core and multiple optional extensions, each of them are related to files in the zip. Every ID of each extension is the foreing key to the core.

Since DarwinCORE files are CSV, we have to identify which fields contains the geometry data. Right now are supported:

footprintWKT
decimalLatitude, decimalLongitude

The dataset import should read the content of meta.xml to generate a valid vrt. Here is an example, but specific code can be found in metadata_catalogue/templates/vrt/definition.xml.

<OGRVRTDataSource>
    <OGRVRTLayer name="data">
      <SrcDataSource><![CDATA[
        <OGRVRTDataSource>
          <OGRVRTLayer name="occurrence">
            <SrcDataSource>CSV:/vsizip/{/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes}/occurrence.txt</SrcDataSource>
            <LayerSRS>WGS84</LayerSRS>
          </OGRVRTLayer>
        </OGRVRTDataSource>]]>
      </SrcDataSource>
      <SrcSQL>select * from occurrence</SrcSQL>

      <GeometryField encoding="PointFromColumns" x="decimalLongitude" y="decimalLatitude" reportSrcColumn="false">
        <GeometryType>wkbPoint</GeometryType>
        <SRS>WGS84</SRS>
      </GeometryField>
      
      <LayerSRS>WGS84</LayerSRS>
    </OGRVRTLayer>
  </OGRVRTDataSource>

Notes about GDAL:

CSV: means that what is following must be treated as a CSV file
/vsizip/{}/occurrence.txt means that the file we are looking for is inside a zip
/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes means that the zipfile itself is a remote zipfile, downloadable from that URL
SrcSQL allows join between data sources
SrcDataSource allows multiple sources to be loaded using CDATA. NOTE this behavour is not documented but is present in GDAL test suite.

IMPORTANT: when using /vsicurl/ it's necessary that streaming responses are disabled, GDAL needs the ´Content-length´ header to be present.

Maps module

Maps module provides a REST backend for displaying static maps, it implements the Maplibre spec and provides REST endpoints to:

Get a portal
List maps in a portal
Get a map metadata
Get a map style

REST

A swagger endpoint is available at /api/docs, it provides a whole documentation of the data structures returned by the backend

Terms

Portal, is a set of maps, it represents a frontend implementing that specific portal
Map, represents a singular Map entity, it's a set of layers sorted in a specific order with some styling. See: see: Maplibre Root Spec
Group, represents a group of Layers, it is used as a building-block to create a hierarchical legend
Layer, represents a single layer that will be shown in a map, see: Maplibre Layer Spec
Source, represents the source dataset itself, for example a WMS remote service or a geojson endpoint. See: Maplibre Sources Spec

Entity Relationships

Data Sources

Vector

Vector data sources can be uploaded as PMTiles, a spatial file format that allows to serve Cloud Optimized vectors as single files that are dynamically fetched by the browser using Http Range requests. See PMTiles Docs for more info about them.

NOTE: the frontend map should add pmtiles protocol

PMTiles files must be pre-processed before uploading to the maps module. Along with the PMTiles it's possible to upload also the original dataset in a different file format. This will be used when user ask for download, while PMTiles is used to display the dataset.

Raster

Raster data sources can be uploaded as Cloud Optimized GeoTIFF(COG), a spatial file format that allows to serve Cloud Optimized tiff as single files that are dynamically fetched by the browser using Http Range requests. See COG Docs for more info about them.

NOTE: the frontend map should add cog protocol

COG files must be pre-processed before uploading to the maps module. Along with the COG it's possible to upload also the original dataset in a different file format. This will be used when user ask for download, while COG is used to display the dataset.

See cog-tools for tools to create a valid COG
See maplibre-gl-cog for displaying COG on maplibre

Portal

a list of portals that use the map module:

Maps NINA - code Github