DarwinCORE Archives
Darwincore archives are zip files that contains certain files:
- eml.xml, contains the metadata
- meta.xml, contains info about all the other files inside the zip
This page explains the code in datasets/libs/darwincore
Meta XML have a core
and multiple optional extensions
, each of them are related to files in the zip.
Every ID
of each extension is the foreing key to the core
Since DarwinCORE files are CSV, we have to identify which fields contains the geometry data. Right now are supported:
The dataset import should read the content of meta.xml
to generate a valid vrt
. Here is an example, but specific code can be found in metadata_catalogue/templates/vrt/definition.xml
<OGRVRTLayer name="data">
<OGRVRTLayer name="occurrence">
<SrcSQL>select * from occurrence</SrcSQL>
<GeometryField encoding="PointFromColumns" x="decimalLongitude" y="decimalLatitude" reportSrcColumn="false">
Notes about GDAL:
means that what is following must be treated as a CSV file/vsizip/{}/occurrence.txt
means that the file we are looking for is inside a zip/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes
means that the zipfile itself is a remote zipfile, downloadable from that URLSrcSQL
between data sourcesSrcDataSource
allows multiple sources to be loaded usingCDATA
. NOTE this behavour is not documented but is present in GDAL test suite.
IMPORTANT: when using /vsicurl/
it's necessary that streaming responses are disabled, GDAL needs the ´Content-length´ header to be present.