Datasets

The Catalogue allows to define Datasets, each dataset is a set of data in a specific format with a set of metadata. The Catalogue aims to support different type of Dataset from different sources, right now the following are supported:

IMPORTANT: it's necessary that sources services disable streaming responses, GDAL needs the ´Content-length´ header to be present. This can be achived using Varnish.

Import

Datasets are retrived from sources using harvesters. Harvesters are python functions that can extract data from a source type and populate the database with the corresponding dataset and metadata.

Harvester functions should be provided as django commands, a set of cli commands that can be executed with python manage.py .... Implemented:

  • fetch_ipt http://my-ipt-server.com

Metadata

Along with the dataset a set of metadata is stored in the database in a normalized way.

Citationcited_by_dataset ManyToOneRelin_dataset_bibliography ManyToManyRelid BigAutoFieldidentifier CharFieldtext TextFieldDatasetmetadata OneToOneRelcontent OneToOneRelid BigAutoFieldname CharFielduuid UUIDFieldsource TextFieldfetch_url TextFieldfetch_type IntegerFieldcreated_at DateTimeFieldlast_modified_at DateTimeFieldowner ForeignKeyvalidated_at DateTimeFieldvalidated_by ForeignKeyfetch_success BooleanFieldfetch_message TextFieldlast_fetch_at DateTimeFieldpublic BooleanFieldKeywordmetadatas ManyToManyRelid BigAutoFieldname CharFielddefinition URLFielddescription TextFieldLicensemetadata ManyToOneRelid BigAutoFieldname CharFieldurl URLFieldMetadatapeople ManyToOneRelorganizations ManyToOneRelmetadataidentifier ManyToOneRelid BigAutoFielddataset OneToOneFieldtitle CharFielddate_created DateTimeFieldlogo_url URLFielddate_publication DateFieldlanguage ForeignKeyabstract TextFieldlicense ForeignKeymaintenance_update_frequency TextFieldmaintenance_update_description TextFieldgeographic_description TextFieldbounding_box GeometryFieldcitation ForeignKeyformation_period_start DateFieldformation_period_end DateFieldformation_period_description TextFieldproject_id CharFieldproject_title CharFieldproject_abstract TextFieldproject_study_area_description TextFieldproject_design_description TextFieldxml TextFieldfts TextFieldkeywords ManyToManyFieldtaxonomies ManyToManyFieldbibliography ManyToManyFieldMetadataIdentifierid BigAutoFieldidentifier CharFieldmetadata ForeignKeysource CharFieldMethodStepid BigAutoFieldorder IntegerFielddescription TextFieldOrganizationperson ManyToOneRelroles ManyToOneRelid BigAutoFieldname TextFieldOrganizationRoleid BigAutoFieldorganization ForeignKeymetadata ForeignKeyrole CharFieldPersonpersonidentifier ManyToOneRelroles ManyToOneRelid BigAutoFieldfirst_name CharFieldlast_name CharFieldbelongs_to ForeignKeyposition CharFieldcountry ForeignKeyemail EmailFieldphone CharFieldcity TextFielddelivery_point TextFieldpostal_code IntegerFieldPersonIdentifierid BigAutoFieldperson ForeignKeytype CharFieldvalue CharFieldPersonRoleid BigAutoFieldperson ForeignKeymetadata ForeignKeyrole CharFielddescription CharFieldTaxonomymetadata ManyToManyRelid BigAutoFieldtype ForeignKeyname CharFieldcommon CharFieldTaxonomyTypetaxonomy ManyToOneRelname CharField

Services and Protocols

To explore and navigate the datasets two services are provided:

  • PyCSW (csw protocol)
  • PyGeoAPI (OGC API)

PyCSW

The metadata stored in the database are converted an XML in the ISO 19139 format using pygeometa. PyCSW is integrated in Django using a custom mapping, so that the metadata are read using the Django ORM instead of SQL.

NOTE: this implies that complex queries may not work as expected.

PyGeoAPI

Datasets are shared through PyGeoAPI using GDAL as resource provider: this is implemented using GDAL VRT for vectors.

NOTE: raster support is missing

Each dataset should provide a valid vrt definition to open the file, this allows for example to serve CSV files as spatial datasets.

This diagram explains the flow that a dataset request follow:

djangogdalpygeoapiServe PyGeoAPIServe VRT definition from dbResource source is a remote vrt/vsicurl/http://django/dataset/<id>/definition.vrtVrt points to the actual sourceand describes how to open itConvert to geojsonRead resourceSend responseuser