Software features
Here is a list of the features available in the metadata catalogue
Metadata module
- Automatic ingestion of DWCA datasets published on IPT
- CSW support to explore/query datasets metadata
- Metadata exposed in ISO 19139
Dataset module
Current:
- Expose vector datasets with OGC API - Features (via PyGEOAPI)
- Different GDAL providers supported (gpkg, shp, parquet, postgresql, csv)
- Support DWCA dataset
- Support on-the-fly operation on vectors
Not supported:
- Other OGC API standards (Coverage, Maps, Tiles, Processes, Records, Environmental Data Retrieval, STAC)
Maps module
NOTE: The software is intended as a solution for displaying datasets on the web using cloud-optimized formats that don't require GIS servers.
Current:
- Display vector datasets (via PMTiles)
- Display raster datasets (via Cloud-Optimized GeoTIFF)
- Organize layers in hierarchy/groups
- Layer legends
- Download of the original dataset
- Style rendering of raster (via TiTiler)
- Style rendering of vectors (via Maplibre JS)
- Zoom to bounding box
- Description of each layer
- Description of each group
- Description of the map
- Custom logo, title
- Basemaps
- APIs for create, edit the map
- Limited UI for simple edit
- Basic info popup
Not supported:
- Query/filter datasets features
- Dynamic style change
- Dynamic datasets that change frequently
- Clusters
- Analysis
It's possible to (re)use the Maps module as a base and to customize it or to develop features based for projects with different requirements.
Catalogue Architecture
NINA catalogue uses Django and PostgreSQL.
The software is made of different parts:
- Django, is the application server, it handles all the logic of answering to browser requests
- Queue, provides long running tasks, scheduled tasks and asyncronous executions
- PostgreSQL, is the database server, it stores the data
- NGINX, is the webserver, it shares user-uploaded files and static assets
- Varnish, provides a caching layer
Features inside the catalogue are organized as modules, each module aims to be as much independent as possible:
Datasets
The Catalogue allows to define Datasets
, each dataset is a set of data in a specific format with a set of metadata.
The Catalogue aims to support different type of Dataset
from different sources, right now the following are supported:
- Dataset Types:
- Sources:
IMPORTANT: it's necessary that sources services disable streaming responses, GDAL needs the ´Content-length´ header to be present. This can be achived using Varnish
.
Import
Datasets are retrived from sources using harvesters
. Harvesters are python functions that can extract data from a source type and populate the database with the corresponding dataset and metadata.
Harvester functions should be provided as django commands
, a set of cli
commands that can be executed with python manage.py ...
.
Implemented:
- fetch_ipt http://my-ipt-server.com
Metadata
Along with the dataset a set of metadata is stored in the database in a normalized way.
Services and Protocols
To explore and navigate the datasets two services are provided:
- PyCSW (csw protocol)
- PyGeoAPI (OGC API)
PyCSW
The metadata stored in the database are converted an XML in the ISO 19139
format using pygeometa
.
PyCSW is integrated in Django using a custom mapping, so that the metadata are read using the Django ORM instead of SQL
.
NOTE: this implies that complex queries may not work as expected.
PyGeoAPI
Datasets are shared through PyGeoAPI using GDAL as resource provider: this is implemented using GDAL VRT
for vectors.
NOTE: raster support is missing
Each dataset should provide a valid vrt
definition to open the file, this allows for example to serve CSV files as spatial datasets.
This diagram explains the flow that a dataset request follow:
DarwinCORE Archives
Darwincore archives are zip files that contains certain files:
- eml.xml, contains the metadata
- meta.xml, contains info about all the other files inside the zip
This page explains the code in datasets/libs/darwincore
.
Meta XML have a core
and multiple optional extensions
, each of them are related to files in the zip.
Every ID
of each extension is the foreing key to the core
.
Since DarwinCORE files are CSV, we have to identify which fields contains the geometry data. Right now are supported:
footprintWKT
decimalLatitude
,decimalLongitude
The dataset import should read the content of meta.xml
to generate a valid vrt
. Here is an example, but specific code can be found in metadata_catalogue/templates/vrt/definition.xml
.
<OGRVRTDataSource>
<OGRVRTLayer name="data">
<SrcDataSource><![CDATA[
<OGRVRTDataSource>
<OGRVRTLayer name="occurrence">
<SrcDataSource>CSV:/vsizip/{/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes}/occurrence.txt</SrcDataSource>
<LayerSRS>WGS84</LayerSRS>
</OGRVRTLayer>
</OGRVRTDataSource>]]>
</SrcDataSource>
<SrcSQL>select * from occurrence</SrcSQL>
<GeometryField encoding="PointFromColumns" x="decimalLongitude" y="decimalLatitude" reportSrcColumn="false">
<GeometryType>wkbPoint</GeometryType>
<SRS>WGS84</SRS>
</GeometryField>
<LayerSRS>WGS84</LayerSRS>
</OGRVRTLayer>
</OGRVRTDataSource>
Notes about GDAL:
CSV:
means that what is following must be treated as a CSV file/vsizip/{}/occurrence.txt
means that the file we are looking for is inside a zip/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes
means that the zipfile itself is a remote zipfile, downloadable from that URLSrcSQL
allowsjoin
between data sourcesSrcDataSource
allows multiple sources to be loaded usingCDATA
. NOTE this behavour is not documented but is present in GDAL test suite.
IMPORTANT: when using /vsicurl/
it's necessary that streaming responses are disabled, GDAL needs the ´Content-length´ header to be present.
Maps module
Maps module provides a REST backend for displaying static maps, it implements the Maplibre spec and provides REST endpoints to:
- Get a portal
- List maps in a portal
- Get a map metadata
- Get a map style
REST
A swagger endpoint is available at /api/docs
, it provides a whole documentation of the data structures returned by the backend
Terms
- Portal, is a set of maps, it represents a frontend implementing that specific portal
- Map, represents a singular Map entity, it's a set of layers sorted in a specific order with some styling. See: see: Maplibre Root Spec
- Group, represents a group of Layers, it is used as a building-block to create a hierarchical legend
- Layer, represents a single layer that will be shown in a map, see: Maplibre Layer Spec
- Source, represents the source dataset itself, for example a
WMS
remote service or ageojson
endpoint. See: Maplibre Sources Spec
Entity Relationships
Data Sources
Vector
Vector data sources can be uploaded as PMTiles
, a spatial file format that allows to serve Cloud Optimized
vectors as single files that are dynamically fetched by the browser using Http Range requests
. See PMTiles Docs for more info about them.
NOTE: the frontend map should add pmtiles
protocol
PMTiles files must be pre-processed before uploading to the maps module. Along with the PMTiles it's possible to upload also the original dataset in a different file format. This will be used when user ask for download, while PMTiles is used to display the dataset.
Raster
Raster data sources can be uploaded as Cloud Optimized GeoTIFF(COG)
, a spatial file format that allows to serve Cloud Optimized
tiff as single files that are dynamically fetched by the browser using Http Range requests
. See COG Docs for more info about them.
NOTE: the frontend map should add cog
protocol
COG files must be pre-processed before uploading to the maps module. Along with the COG it's possible to upload also the original dataset in a different file format. This will be used when user ask for download, while COG is used to display the dataset.
- See cog-tools for tools to create a valid COG
- See maplibre-gl-cog for displaying COG on maplibre
Portal
a list of portals that use the map module: