Software features

Here is a list of the features available in the metadata catalogue

Metadata module

  • Automatic ingestion of DWCA datasets published on IPT
  • CSW support to explore/query datasets metadata
  • Metadata exposed in ISO 19139

Dataset module

Current:

  • Expose vector datasets with OGC API - Features (via PyGEOAPI)
  • Different GDAL providers supported (gpkg, shp, parquet, postgresql, csv)
  • Support DWCA dataset
  • Support on-the-fly operation on vectors

Not supported:

  • Other OGC API standards (Coverage, Maps, Tiles, Processes, Records, Environmental Data Retrieval, STAC)

Maps module

NOTE: The software is intended as a solution for displaying datasets on the web using cloud-optimized formats that don't require GIS servers.

Current:

  • Display vector datasets (via PMTiles)
  • Display raster datasets (via Cloud-Optimized GeoTIFF)
  • Organize layers in hierarchy/groups
  • Layer legends
  • Download of the original dataset
  • Style rendering of raster (via TiTiler)
  • Style rendering of vectors (via Maplibre JS)
  • Zoom to bounding box
  • Description of each layer
  • Description of each group
  • Description of the map
  • Custom logo, title
  • Basemaps
  • APIs for create, edit the map
  • Limited UI for simple edit
  • Basic info popup

Not supported:

  • Query/filter datasets features
  • Dynamic style change
  • Dynamic datasets that change frequently
  • Clusters
  • Analysis

It's possible to (re)use the Maps module as a base and to customize it or to develop features based for projects with different requirements.

Catalogue Architecture

NINA catalogue uses Django and PostgreSQL.

The software is made of different parts:

  • Django, is the application server, it handles all the logic of answering to browser requests
  • Queue, provides long running tasks, scheduled tasks and asyncronous executions
  • PostgreSQL, is the database server, it stores the data
  • NGINX, is the webserver, it shares user-uploaded files and static assets
  • Varnish, provides a caching layer

Features inside the catalogue are organized as modules, each module aims to be as much independent as possible:

Datasets

The Catalogue allows to define Datasets, each dataset is a set of data in a specific format with a set of metadata. The Catalogue aims to support different type of Dataset from different sources, right now the following are supported:

IMPORTANT: it's necessary that sources services disable streaming responses, GDAL needs the ´Content-length´ header to be present. This can be achived using Varnish.

Import

Datasets are retrived from sources using harvesters. Harvesters are python functions that can extract data from a source type and populate the database with the corresponding dataset and metadata.

Harvester functions should be provided as django commands, a set of cli commands that can be executed with python manage.py .... Implemented:

  • fetch_ipt http://my-ipt-server.com

Metadata

Along with the dataset a set of metadata is stored in the database in a normalized way.

Citationcited_by_dataset ManyToOneRelin_dataset_bibliography ManyToManyRelid BigAutoFieldidentifier CharFieldtext TextFieldDatasetmetadata OneToOneRelcontent OneToOneRelid BigAutoFieldname CharFielduuid UUIDFieldsource TextFieldfetch_url TextFieldfetch_type IntegerFieldcreated_at DateTimeFieldlast_modified_at DateTimeFieldowner ForeignKeyvalidated_at DateTimeFieldvalidated_by ForeignKeyfetch_success BooleanFieldfetch_message TextFieldlast_fetch_at DateTimeFieldpublic BooleanFieldKeywordmetadatas ManyToManyRelid BigAutoFieldname CharFielddefinition URLFielddescription TextFieldLicensemetadata ManyToOneRelid BigAutoFieldname CharFieldurl URLFieldMetadatapeople ManyToOneRelorganizations ManyToOneRelmetadataidentifier ManyToOneRelid BigAutoFielddataset OneToOneFieldtitle CharFielddate_created DateTimeFieldlogo_url URLFielddate_publication DateFieldlanguage ForeignKeyabstract TextFieldlicense ForeignKeymaintenance_update_frequency TextFieldmaintenance_update_description TextFieldgeographic_description TextFieldbounding_box GeometryFieldcitation ForeignKeyformation_period_start DateFieldformation_period_end DateFieldformation_period_description TextFieldproject_id CharFieldproject_title CharFieldproject_abstract TextFieldproject_study_area_description TextFieldproject_design_description TextFieldxml TextFieldfts TextFieldkeywords ManyToManyFieldtaxonomies ManyToManyFieldbibliography ManyToManyFieldMetadataIdentifierid BigAutoFieldidentifier CharFieldmetadata ForeignKeysource CharFieldMethodStepid BigAutoFieldorder IntegerFielddescription TextFieldOrganizationperson ManyToOneRelroles ManyToOneRelid BigAutoFieldname TextFieldOrganizationRoleid BigAutoFieldorganization ForeignKeymetadata ForeignKeyrole CharFieldPersonpersonidentifier ManyToOneRelroles ManyToOneRelid BigAutoFieldfirst_name CharFieldlast_name CharFieldbelongs_to ForeignKeyposition CharFieldcountry ForeignKeyemail EmailFieldphone CharFieldcity TextFielddelivery_point TextFieldpostal_code IntegerFieldPersonIdentifierid BigAutoFieldperson ForeignKeytype CharFieldvalue CharFieldPersonRoleid BigAutoFieldperson ForeignKeymetadata ForeignKeyrole CharFielddescription CharFieldTaxonomymetadata ManyToManyRelid BigAutoFieldtype ForeignKeyname CharFieldcommon CharFieldTaxonomyTypetaxonomy ManyToOneRelname CharField

Services and Protocols

To explore and navigate the datasets two services are provided:

  • PyCSW (csw protocol)
  • PyGeoAPI (OGC API)

PyCSW

The metadata stored in the database are converted an XML in the ISO 19139 format using pygeometa. PyCSW is integrated in Django using a custom mapping, so that the metadata are read using the Django ORM instead of SQL.

NOTE: this implies that complex queries may not work as expected.

PyGeoAPI

Datasets are shared through PyGeoAPI using GDAL as resource provider: this is implemented using GDAL VRT for vectors.

NOTE: raster support is missing

Each dataset should provide a valid vrt definition to open the file, this allows for example to serve CSV files as spatial datasets.

This diagram explains the flow that a dataset request follow:

djangogdalpygeoapiServe PyGeoAPIServe VRT definition from dbResource source is a remote vrt/vsicurl/http://django/dataset/<id>/definition.vrtVrt points to the actual sourceand describes how to open itConvert to geojsonRead resourceSend responseuser

DarwinCORE Archives

Darwincore archives are zip files that contains certain files:

  • eml.xml, contains the metadata
  • meta.xml, contains info about all the other files inside the zip

This page explains the code in datasets/libs/darwincore.

Meta XML have a core and multiple optional extensions, each of them are related to files in the zip. Every ID of each extension is the foreing key to the core.

Since DarwinCORE files are CSV, we have to identify which fields contains the geometry data. Right now are supported:

  • footprintWKT
  • decimalLatitude, decimalLongitude

The dataset import should read the content of meta.xml to generate a valid vrt. Here is an example, but specific code can be found in metadata_catalogue/templates/vrt/definition.xml.

<OGRVRTDataSource>
    <OGRVRTLayer name="data">
      <SrcDataSource><![CDATA[
        <OGRVRTDataSource>
          <OGRVRTLayer name="occurrence">
            <SrcDataSource>CSV:/vsizip/{/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes}/occurrence.txt</SrcDataSource>
            <LayerSRS>WGS84</LayerSRS>
          </OGRVRTLayer>
        </OGRVRTDataSource>]]>
      </SrcDataSource>
      <SrcSQL>select * from occurrence</SrcSQL>

      <GeometryField encoding="PointFromColumns" x="decimalLongitude" y="decimalLatitude" reportSrcColumn="false">
        <GeometryType>wkbPoint</GeometryType>
        <SRS>WGS84</SRS>
      </GeometryField>
      
      <LayerSRS>WGS84</LayerSRS>
    </OGRVRTLayer>
  </OGRVRTDataSource>

Notes about GDAL:

  • CSV: means that what is following must be treated as a CSV file
  • /vsizip/{}/occurrence.txt means that the file we are looking for is inside a zip
  • /vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes means that the zipfile itself is a remote zipfile, downloadable from that URL
  • SrcSQL allows join between data sources
  • SrcDataSource allows multiple sources to be loaded using CDATA. NOTE this behavour is not documented but is present in GDAL test suite.

IMPORTANT: when using /vsicurl/ it's necessary that streaming responses are disabled, GDAL needs the ´Content-length´ header to be present.

Maps module

Maps module provides a REST backend for displaying static maps, it implements the Maplibre spec and provides REST endpoints to:

  • Get a portal
  • List maps in a portal
  • Get a map metadata
  • Get a map style

REST

A swagger endpoint is available at /api/docs, it provides a whole documentation of the data structures returned by the backend

Terms

  • Portal, is a set of maps, it represents a frontend implementing that specific portal
  • Map, represents a singular Map entity, it's a set of layers sorted in a specific order with some styling. See: see: Maplibre Root Spec
  • Group, represents a group of Layers, it is used as a building-block to create a hierarchical legend
  • Layer, represents a single layer that will be shown in a map, see: Maplibre Layer Spec
  • Source, represents the source dataset itself, for example a WMS remote service or a geojson endpoint. See: Maplibre Sources Spec

Entity Relationships

Layerid BigAutoFieldname CharFieldslug SlugFieldmap ForeignKeysource ForeignKeysource_layer CharFieldstyle JSONFieldmap_order IntegerField"group" ForeignKeygroup_order IntegerFieldLayerGroupid BigAutoFieldname CharFieldorder IntegerFieldmap ForeignKeydownload_url URLFieldMapid BigAutoFieldtitle CharFieldslug SlugFieldsubtitle CharFielddescription TextFieldzoom IntegerFieldextra JSONFieldowner ForeignKeyvisibility CharFieldPortalid BigAutoFielduuid UUIDFieldtitle CharFieldvisibility CharFieldowner ForeignKeyextra JSONFieldPortalMapid BigAutoFieldmap ForeignKeyportal ForeignKeyorder IntegerFieldextra JSONFieldRasterSourceid BigAutoFieldname CharFieldslug SlugFieldextra JSONFieldowner ForeignKeystyle JSONFieldsource FileFieldoriginal_data FileFieldprotocol CharFieldurl URLFieldattribution CharFieldSourceid BigAutoFieldname CharFieldslug SlugFieldextra JSONFieldowner ForeignKeystyle JSONFieldVectorSourceid BigAutoFieldname CharFieldslug SlugFieldextra JSONFieldowner ForeignKeystyle JSONFieldsource FileFieldoriginal_data FileFieldprotocol CharFieldurl URLFieldattribution CharFielddefault_layer CharField

Data Sources

Vector

Vector data sources can be uploaded as PMTiles, a spatial file format that allows to serve Cloud Optimized vectors as single files that are dynamically fetched by the browser using Http Range requests. See PMTiles Docs for more info about them.

NOTE: the frontend map should add pmtiles protocol

PMTiles files must be pre-processed before uploading to the maps module. Along with the PMTiles it's possible to upload also the original dataset in a different file format. This will be used when user ask for download, while PMTiles is used to display the dataset.

Raster

Raster data sources can be uploaded as Cloud Optimized GeoTIFF(COG), a spatial file format that allows to serve Cloud Optimized tiff as single files that are dynamically fetched by the browser using Http Range requests. See COG Docs for more info about them.

NOTE: the frontend map should add cog protocol

COG files must be pre-processed before uploading to the maps module. Along with the COG it's possible to upload also the original dataset in a different file format. This will be used when user ask for download, while COG is used to display the dataset.

Portal

a list of portals that use the map module: