6. Practical Guides

This chapter includes how-to’s and other practical guidance for data producers.

6.1 Create a Data Management Plan (DMP)


The funding agency of your project will usually provide requirements, guidelines or a template for the DMP. If this is not the case or for datasets that are not part of a project use the template provided by your institution or the template based on the recommendations by Science Europe.

6.1.1 Using easyDMP

  1. Log in to easyDMP, use Dataporten if your institution supports that, otherwise pick one of the other login methods.

  2. Click on + Create a new plan and pick a template

  3. By using the Summary button from page two and on, you can get an overview of all the questions.

6.1.2 Publishing the plan

Currently you can use the export function in easyDMP to download an HTML or PDF version of the DMP and use it further. This might change if "Hosted DMP" gets implemented.

6.2 Submitting data as NetCDF-CF

6.2.1 Workflow

  1. Define your dataset (see dataset and ???)

  2. Create a NetCDF-CF file (see Creating NetCDF-CF files)

  3. Store the NetCDF-CF file in a suitable location, and distribute it via thredds or another dap server (see, e.g., How to add NetCDF-CF data to thredds)

  4. Register your dataset in a searchable catalog (see How to register your data in the catalog service)

6.2.2 Creating NetCDF-CF files

By documenting and formatting your data using NetCDF following the CF conventions and the Attribute Convention for Data Discovery (ACDD), MMD files can be automatically generated from the NetCDF files. The CF conventions is a controlled vocabulary providing a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. The ACDD vocabulary describes attributes recommended for describing a NetCDF dataset to data discovery systems. See, e.g., netCDF4-python docs, or xarray docs for documentation about how to create netCDF files.

The ACDD recommendations should be followed in order to properly document your netCDF-CF files. The below tables summarize required and recommended ACDD and some additional attributes that are needed to properly populate a discovery metadata catalog which fulfills the requirements of international standards (e.g., GCMD/DIF, the INSPIRE and WMO profiles of ISO19115, etc.).

6.2.2.1 Notes

Keywords describe the content of your dataset following a given vocabulary. You may use any vocabularies to define your keywords, but a link to the keyword definitions should be provided in the ``keywords_vocabulary`` attribute. This attribute provides information about the vocabulary defining the keywords used in the ``keywords`` attribute. Example:

:keywords_vocabulary = "GCMDSK:GCMD Science Keywords:https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords, GEMET:INSPIRE Themes:http://inspire.ec.europa.eu/theme, NORTHEMES:GeoNorge Themes:https://register.geonorge.no/metadata-kodelister/nasjonal-temainndeling" ;

Note that the GCMDSK, GEMET and NORTHEMES vocabularies are required for indexing in S-ENDA and Geonorge. You may find appropriate keywords at the following links:

The keywords should be provided by the ``keywords`` attribute as a comma separated list with a short name defining the vocabulary used, followed by the actual keyword, i.e., ``short_name:keyword``. Example:

:keywords = "GCMDSK:Earth Science > Atmosphere > Atmospheric radiation, GEMET:Meteorological geographical features, GEMET:Atmospheric conditions, NORTHEMES:Weather and climate" ;

See https://adc.met.no/node/96 for more information about how to define the ACDD keywords.

A data license provides information about any restrictions on the use of the dataset. To support a linked data approach, the ``license`` element should be supported by a ``license_resource`` element, providing a link to the license definition. Example:

:license = "CC-BY-4.0" ;
:license_resource = "http://spdx.org/licenses/CC-BY-4.0" ;

6.2.2.2 List of Attributes

This section provides lists of ACDD elements that are required and recommended, as well as some extra elements that are needed to fully support our data management needs. The right columns of these tables provide the MET Norway Metadata Specification (MMD) fields that map to the ACDD (and our extension to ACDD) elements. Please refer to MMD for definitions of these elements, as well as controlled vocabularies that should be used. Note that the below tables are automatically generated - check https://github.com/metno/py-mmd-tools/blob/master/py_mmd_tools/mmd_elements.yaml if anything is unclear.

In order to check your netCDF-CF files, and to create MMD xml files, you can use the nc2mmd.py script in the py-mmd-tools Python package.

The following ACDD elements are required:

ACDD AttributeMMD equivalentComment
idmetadata_identifierRequired, and should be UUID. No repetition allowed.
naming_authoritymetadata_identifierRequired. We recommend using reverse-DNS naming. No repetition allowed.
date_createdlast_metadata_update>update>datetimeFormat as ISO8601.
titletitle>titleUse ACDD extension "title_no" for Norwegian translation.
summaryabstract>abstractUse ACDD extension "summary_no" for Norwegian translation.
time_coverage_starttemporal_extent>start_dateComma separated list.
geospatial_lat_maxgeographic_extent>rectangle>northNo repetition allowed.
geospatial_lat_mingeographic_extent>rectangle>southNo repetition allowed.
geospatial_lon_maxgeographic_extent>rectangle>eastNo repetition allowed.
geospatial_lon_mingeographic_extent>rectangle>westNo repetition allowed.
keywordskeywords>keywordComma separated list.
keywords_vocabularykeywords>vocabularyComma separated list.

The following ACDD elements are (highly) recommended:

ACDD AttributeDefaultMMD equivalentComment
date_metadata_modifiedlast_metadata_update>update>datetimeFormat as ISO8601. Comma separated list if more than once.
time_coverage_endtemporal_extent>end_dateComma separated list.
geospatial_boundsgeographic_extent>polygonNo repetition allowed.
processing_leveloperational_statusNo repetition allowed. See the MMD docs for valid keywords.
licenseuse_constraint>identifierNo repetition allowed.
creator_roleInvestigatorpersonnel>roleComma separated list.
contributor_roleInvestigatorpersonnel>roleComma separated list.
creator_nameNot availablepersonnel>nameComma separated list.
contributor_nameNot availablepersonnel>nameComma separated list.
creator_emailNot availablepersonnel>emailComma separated list.
creator_institutionNot availablepersonnel>organisationComma separated list.
institutiondata_center>data_center_name>long_nameComma separated list.
publisher_urldata_center>data_center_urlComma separated list.
projectproject>long_nameSemicolon separated list.
platformplatform>long_nameComma separated list.
platform_vocabularyplatform>resourceComma separated list.
instrumentplatform>instrument>long_nameComma separated list.
instrument_vocabularyplatform>instrument>resourceComma separated list.
sourceactivity_typeSemicolon separated list.
creator_namedataset_citation>authorComma separated list.
date_createddataset_citation>publication_dateComma separated list.
titledataset_citation>title
publisher_namedataset_citation>publisherComma separated list.
metadata_linkdataset_citation>urlComma separated list.
referencesdataset_citation>otherComma separated list.

The following elements are ACDD extensions that are needed to improve (meta)data interoperability. Please refer to the documentation of MMD for more details:

Necessary non-ACDD AttributeDefaultMMD equivalentComment
spatial_representationspatial_representationNo repetition allowed.
alternate_identifieralternate_identifier>alternate_identifierAlternative identifier for the dataset (but not DOI). Comma separated list.
alternate_identifier_typealternate_identifier>typeIdentification of the type of identifier used. Comma separated list.
date_metadata_modified_typelast_metadata_update>update>typeE.g., major or minor modification. Comma separated list.
date_created_typeCreatedlast_metadata_update>update>type
title_notitle>titleUsed for Norwegian version of the title.
title_langentitle>langISO language code.
summary_noabstract>abstractUsed for Norwegian version of the abstract.
summary_langenabstract>langISO language code.
dataset_production_statusCompletedataset_production_statusNo repetition allowed.
access_constraintaccess_constraintNo repetition allowed.
license_resourceuse_constraint>resourceNo repetition allowed.
contributor_emailNot availablepersonnel>emailComma separated list.
contributor_institutionpersonnel>organisation
contributor_organisationpersonnel>organisation
institution_short_namedata_center>data_center_name>short_nameComma separated list.
related_dataset_idrelated_dataset>related_datasetComma separated list.
related_dataset_relation_typerelated_dataset>relation_typeComma separated list.
iso_topic_categoryiso_topic_categoryComma separated list.
project_short_nameproject>short_nameSemicolon separated list.
quality_controlquality_controlNo repetition allowed.
doidataset_citation>doi

6.2.3 How to add NetCDF-CF data to thredds

This section should contain institution specific information about how to add netcdf-cf files to thredds.

6.2.4 How to register your data in the catalog service

In order to make a dataset findable, a dataset must be registered in a searchable catalog with appropriate metadata. The (meta)data catalog is indexed and exposed through CSW.

The following needs to be done:

  1. Generate an MMD xml file from your NetCDF-CF file (see Generation of MMD xml file from NetCDF-CF)

  2. Test your mmd xml metadata file (see Test the MMD xml file)

  3. Push the MMD xml file to the discovery metadata catalog (see Push the MMD xml file to the discovery metadata catalog)

6.2.4.1 Generation of MMD xml file from NetCDF-CF

Clone the py-mmd-tools repo and make a local installation with eg pip install .. This should bring in all needed dependencies (we recommend to use a virtual environment).

Then, generate your mmd xml file as follows:

cd script ./nc2mmd.py -i <your netcdf file> -o <your xml output directory></programlisting>

See ./nc2mmd.py --help for documentation and extra options.

You will find Extensible Stylesheet Language Transformations (XSLT) documents in the MMD repository. These can be used to translate the metadata documents from MMD to other vocabularies, such as ISO19115:

./bin/convert_from_mmd -i <your mmd xml file> -f iso -o <your iso output file name></programlisting>

Note that the discovery metadata catalog ingestion tool will take care of translations from MMD, so you don’t need to worry about that unless you have special interest in it.

6.2.4.2 Test the MMD xml file

Install the dmci app, and run the usage example locally. This will return an error message if anything is wrong with your MMD file.

6.2.4.3 Push the MMD xml file to the discovery metadata catalog

For development and verification purposes:

curl --data-binary "@<PATH_TO_MMD_FILE>" https://dmci-*.s-enda.k8s.met.no/v1/insert</programlisting>

where * should be either dev or staging.

For production (the official catalog):

curl --data-binary "@<PATH_TO_MMD_FILE>" https://dmci.s-enda.k8s.met.no/v1/insert</programlisting>

6.3 Searching data in the Catalog Service for the Web (CSW) interface

6.3.1 Using OpenSearch

6.3.1.1 Local test machines

The vagrant-s-enda environment found at vagrant-s-enda provides OpenSearch support through PyCSW. To test OpenSearch via the browser, start the vagrant-s-enda vm (vagrant up) and go to the following address:

  • http://10.10.10.10/pycsw/csw.py?mode=opensearch&service=CSW&version=2.0.2&request=GetCapabilities

6.3.1.2 Online catalog

For searching the online metadata catalog, the base url (http://10.10.10.10/) must be replaced by https://csw.s-enda.k8s.met.no/:

  • http://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results

6.3.1.3 OpenSearch examples

To find all datasets in the catalog:

  • https://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results

Or datasets within a given time span:

  • http://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&time=2000-01-01/2020-09-01

Or datasets within a geographical domain (defined as a box with parameters min_longitude, min_latitude, max_longitude, max_latitude):

  • https://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&bbox=0,40,10,60

Or, datasets from any of the Sentinel satellites:

  • https://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&q=sentinel

PyCSW opensearch only supports geographical searches querying for a box. For more advanced geographical searches, one must write specific XML files. For example:

  • To find all datasets containing a point:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords
    xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
    xmlns:ogc="http://www.opengis.net/ogc"
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    service="CSW"
    version="2.0.2"
    resultType="results"
    maxRecords="10"
    outputFormat="application/xml"
    outputSchema="http://www.opengis.net/cat/csw/2.0.2"
    xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>full</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:Contains>
          <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
          <gml:Point>
            <gml:pos srsDimension="2">59.0 4.0</gml:pos>
          </gml:Point>
        </ogc:Contains>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>
  • To find all datasets intersecting a polygon:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords
    xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:ogc="http://www.opengis.net/ogc"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    service="CSW"
    version="2.0.2"
    resultType="results"
    maxRecords="10"
    outputFormat="application/xml"
    outputSchema="http://www.opengis.net/cat/csw/2.0.2"
    xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>full</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:Intersects>
          <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
          <gml:Polygon>
            <gml:exterior>
              <gml:LinearRing>
                <gml:posList>
                  47.00 -5.00 55.00 -5.00 55.00 20.00 47.00 20.00 47.00 -5.00
                </gml:posList>
              </gml:LinearRing>
            </gml:exterior>
          </gml:Polygon>
        </ogc:Intersects>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>
  • Then, you can query the CSW endpoint with, e.g., python:
import requests
requests.post('https://csw.s-enda.k8s.met.no', data=open(my_xml_request).read()).text

6.3.3 Web portals

GeoNorge.no

TODO: describe how to search in geonorge, possibly with screenshots

6.3.4 QGIS

MET Norway’s S-ENDA CSW catalog service is available at https://csw.s-enda.k8s.met.no. This can be used from QGIS as follows:

  1. Select Web > MetaSearch > MetaSearch menu item

  2. Select Services > New

  3. Type, e.g., csw.s-enda.k8s.met.no for the name

  4. Type https://csw.s-enda.k8s.met.no for the URL

Under the Search tab, you can then add search parameters, click Search, and get a list of available datasets.