The Catalogue allows to define Datasets
, each dataset is a set of data in a specific format with a set of metadata.
The Catalogue aims to support different type of Dataset
from different sources, right now the following are supported:
IMPORTANT : it's necessary that sources services disable streaming responses, GDAL needs the ´Content-length´ header to be present. This can be achived using Varnish
.
Datasets are retrived from sources using harvesters
. Harvesters are python functions that can extract data from a source type and populate the database with the corresponding dataset and metadata.
Harvester functions should be provided as django commands
, a set of cli
commands that can be executed with python manage.py ...
.
Implemented:
fetch_ipt http://my-ipt-server.com
Along with the dataset a set of metadata is stored in the database in a normalized way.
Citation cited_by_dataset ManyToOneRel in_dataset_bibliography ManyToManyRel id BigAutoField identifier CharField text TextField Dataset metadata OneToOneRel content OneToOneRel id BigAutoField name CharField uuid UUIDField source TextField fetch_url TextField fetch_type IntegerField created_at DateTimeField last_modified_at DateTimeField owner ForeignKey validated_at DateTimeField validated_by ForeignKey fetch_success BooleanField fetch_message TextField last_fetch_at DateTimeField public BooleanField Keyword metadatas ManyToManyRel id BigAutoField name CharField definition URLField description TextField License metadata ManyToOneRel id BigAutoField name CharField url URLField Metadata people ManyToOneRel organizations ManyToOneRel metadataidentifier ManyToOneRel id BigAutoField dataset OneToOneField title CharField date_created DateTimeField logo_url URLField date_publication DateField language ForeignKey abstract TextField license ForeignKey maintenance_update_frequency TextField maintenance_update_description TextField geographic_description TextField bounding_box GeometryField citation ForeignKey formation_period_start DateField formation_period_end DateField formation_period_description TextField project_id CharField project_title CharField project_abstract TextField project_study_area_description TextField project_design_description TextField xml TextField fts TextField keywords ManyToManyField taxonomies ManyToManyField bibliography ManyToManyField MetadataIdentifier id BigAutoField identifier CharField metadata ForeignKey source CharField MethodStep id BigAutoField order IntegerField description TextField Organization person ManyToOneRel roles ManyToOneRel id BigAutoField name TextField OrganizationRole id BigAutoField organization ForeignKey metadata ForeignKey role CharField Person personidentifier ManyToOneRel roles ManyToOneRel id BigAutoField first_name CharField last_name CharField belongs_to ForeignKey position CharField country ForeignKey email EmailField phone CharField city TextField delivery_point TextField postal_code IntegerField PersonIdentifier id BigAutoField person ForeignKey type CharField value CharField PersonRole id BigAutoField person ForeignKey metadata ForeignKey role CharField description CharField Taxonomy metadata ManyToManyRel id BigAutoField type ForeignKey name CharField common CharField TaxonomyType taxonomy ManyToOneRel name CharField
To explore and navigate the datasets two services are provided:
PyCSW (csw protocol)
PyGeoAPI (OGC API)
The metadata stored in the database are converted an XML in the ISO 19139
format using pygeometa
.
PyCSW is integrated in Django using a custom mapping, so that the metadata are read using the Django ORM instead of SQL
.
NOTE : this implies that complex queries may not work as expected.
Datasets are shared through PyGeoAPI using GDAL as resource provider: this is implemented using GDAL VRT
for vectors.
NOTE : raster support is missing
Each dataset should provide a valid vrt
definition to open the file, this allows for example to serve CSV files as spatial datasets.
This diagram explains the flow that a dataset request follow:
django gdal pygeoapi Serve PyGeoAPI Serve VRT definition from db Resource source is a remote vrt /vsicurl/http://django/dataset/<id>/definition.vrt Vrt points to the actual source and describes how to open it Convert to geojson Read resource Send response user