DarwinCORE Archives
Darwincore archives are zip files that contains certain files:
- eml.xml, contains the metadata
- meta.xml, contains info about all the other files inside the zip
This page explains the code in datasets/libs/darwincore
.
Meta XML have a core
and multiple optional extensions
, each of them are related to files in the zip.
Every ID
of each extension is the foreing key to the core
.
Since DarwinCORE files are CSV, we have to identify which fields contains the geometry data. Right now are supported:
footprintWKT
decimalLatitude
,decimalLongitude
The dataset import should read the content of meta.xml
to generate a valid vrt
. Here is an example, but specific code can be found in metadata_catalogue/templates/vrt/definition.xml
.
<OGRVRTDataSource>
<OGRVRTLayer name="data">
<SrcDataSource><![CDATA[
<OGRVRTDataSource>
<OGRVRTLayer name="occurrence">
<SrcDataSource>CSV:/vsizip/{/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes}/occurrence.txt</SrcDataSource>
<LayerSRS>WGS84</LayerSRS>
</OGRVRTLayer>
</OGRVRTDataSource>]]>
</SrcDataSource>
<SrcSQL>select * from occurrence</SrcSQL>
<GeometryField encoding="PointFromColumns" x="decimalLongitude" y="decimalLatitude" reportSrcColumn="false">
<GeometryType>wkbPoint</GeometryType>
<SRS>WGS84</SRS>
</GeometryField>
<LayerSRS>WGS84</LayerSRS>
</OGRVRTLayer>
</OGRVRTDataSource>
Notes about GDAL:
CSV:
means that what is following must be treated as a CSV file/vsizip/{}/occurrence.txt
means that the file we are looking for is inside a zip/vsicurl/https://ipt.nina.no/archive.do?r=5912basidiomycetes
means that the zipfile itself is a remote zipfile, downloadable from that URLSrcSQL
allowsjoin
between data sourcesSrcDataSource
allows multiple sources to be loaded usingCDATA
. NOTE this behavour is not documented but is present in GDAL test suite.
IMPORTANT: when using /vsicurl/
it's necessary that streaming responses are disabled, GDAL needs the ´Content-length´ header to be present.