Data Sync Scripts

datasync

Provide subcommands for synchronizing different resources, see subcommands

Usage

datasync [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

Name	Description	Required	Default
`--install-completion`	Install completion for the current shell.	No	-
`--show-completion`	Show completion for the current shell, to copy it or customize the installation.	No	-

Commands

Name	Description
`nva`	Commands to handle NVA tasks
`ubw`	Export UBW APIs to Parquet in S3 bucket
`dms`
`ninagen`	Commands to handle NINAGEN tasks
`pit-registering-salmon`
`grass-gis`
`services`	Miljødata Infrastructure as Code pipelines
`gbif-backbone`	export GBIF Backbone data to DuckDB database
`ipt`	Provide commands to deal with IPT

Subcommands

nva

Commands to handle NVA tasks

Usage

datasync nva [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

run

Sync NVA data from REST API to target

Usage

datasync nva run [OPTIONS]

Arguments

No arguments available

Options

Name	Required	Default
`--resources / --no-resources`	No	`no-resources`
`--projects / --no-projects`	No	`no-projects`
`--persons / --no-persons`	No	`no-persons`
`--categories / --no-categories`	No	`no-categories`
`--funding-sources / --no-funding-sources`	No	`no-funding-sources`
`--base-url`	No	`https://api.nva.unit.no/`
`--duckdb-name`	No	`nva_sync`
`--institution-code`	No	`7511.0.0.0`
`--endpoint-url`	No	-
`--access-key`	No	-
`--secret-key`	No	-
`--bucket`	No	-
`--prefix`	No	`nva`
`--region`	No	`us-east-1`

filter-data

Filter and create the tables from NVA data to parquet

Usage

datasync nva filter-data [OPTIONS]

Arguments

No arguments available

Options

Name	Required	Default
`--data-s3-path`	No	-
`--storage-s3-path`	No	-
`--storage-access-key`	No	-
`--storage-secret-key`	No	-
`--storage-bucket`	No	-
`--storage-prefix`	No	`nva-filtered`
`--storage-url-style`	No	`path`

ubw

Export UBW APIs to Parquet in S3 bucket

Usage

datasync ubw [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

run

No description available

Usage

datasync ubw run [OPTIONS]

Arguments

No arguments available

Options

Name	Required	Default
`--access-key`	No	-
`--secret-key`	No	-
`--endpoint-url`	No	-
`--bucket`	No	-
`--prefix`	No	-
`--base-url`	No	-
`--auth`	No	-

dms

No description available

Usage

datasync dms [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

generate-csw-metadata

No description available

Usage

datasync dms generate-csw-metadata [OPTIONS]

Arguments

No arguments available

Options

Name	Description	Required	Default
`--base-url`		No	-
`--access-key`		No	-
`--secret-key`		No	-
`--endpoint`		No	-
`--bucket`		No	-
`--publish-url`		No	-
`--limit`		No	-
`--search`	Filter resources by title using a LIKE expression	No	-

generate-geoapi-config

No description available

Usage

datasync dms generate-geoapi-config [OPTIONS]

Arguments

No arguments available

Options

Name	Description	Required	Default
`--base-url`		No	-
`--publish-url`		No	-
`--search`	Filter resources by title using a LIKE expression	No	-

generate-maps-json

Generate a maps.json file from map resources in the DMS parquet files. The output follows the format used by the NINA map-editor. The URL for each map is read from the uri field of the resource. The file is written to S3 as a publicly accessible file.

Usage

datasync dms generate-maps-json [OPTIONS]

Arguments

No arguments available

Options

Name	Description	Required	Default
`--base-url`		No	-
`--access-key`		No	-
`--secret-key`		No	-
`--endpoint`		No	-
`--where`	Provide an additional SQL filter	No	`1=1`
`--bucket`	S3 bucket for output (e.g., 'my-bucket')	No	-
`--output`	S3 key path for output JSON file	No	`/dms/maps/maps.json`

ninagen

Commands to handle NINAGEN tasks

Usage

datasync ninagen [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

snp-database-normalize

Convert SNP excel sheet to parquet

Usage

datasync ninagen snp-database-normalize [OPTIONS] FILE [SHEET]

Arguments

Name	Description	Required
`FILE`	path to the file	Yes
`SHEET`	Name of the Excel Sheet to use	No

Options

Name	Description	Required	Default
`--header-row`	XLSX Row number that contains the header	No	`1`
`--allele-start-column`	XLSX column that contains the first allele	No	`F`

snp-analysis-to-parquet

Convert SNP csv of an analysis to a parquet file

Usage

datasync ninagen snp-analysis-to-parquet [OPTIONS] FILE

Arguments

Name	Description	Required
`FILE`	Path to the csv file	Yes

Options

No options available

pit-registering-salmon

No description available

Usage

datasync pit-registering-salmon [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

run

Download PIT data from BioMark's API to a .duckdb file.

Usage

datasync pit-registering-salmon run [OPTIONS]

Arguments

No arguments available

Options

Name	Description	Required	Default
`--duckdb-path`		No	`biomark_pit_registering_salmon_v1.duckdb`
`--place`	Site location (kongsfjord, sylte, vigda, agdenes, vatne)	No	-
`--begin-date`	Start date for data download in YYYY-MM-DD format	No	-
`--end-date`	End date for data download in YYYY-MM-DD format	No	-
`--tags / --no-tags`	Download tags data	No	`no-tags`
`--readers / --no-readers`	Download readers voltage data	No	`no-readers`
`--environment / --no-environment`	Download environment data	No	`no-environment`
`--all-locations / --no-all-locations`	Download data from all accessible locations	No	`no-all-locations`
`--base-url`		No	`https://data3.biomark.com/api/v1/`
`--yesterday / --no-yesterday`	Set date range to yesterday only	No	`no-yesterday`
`--dataset-name`		No	`main`

replicate

Upload data from .duckdb to S3 bucket.

Usage

datasync pit-registering-salmon replicate [OPTIONS]

Arguments

No arguments available

Options

Name	Description	Required	Default
`--bucket`		No	-
`--endpoint-url`		No	-
`--access-key`		No	-
`--secret-key`		No	-
`--duckdb-path`		No	`biomark_pit_registering_salmon_v1.duckdb`
`--region`		No	`us-east-1`
`--dataset-name`		No	`main`
`--tags / --no-tags`	Add tags data to S3	No	`no-tags`
`--readers / --no-readers`	Add readers voltage data to S3	No	`no-readers`
`--environment / --no-environment`	Add environment data to S3	No	`no-environment`

grass-gis

No description available

Usage

datasync grass-gis [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

register-layers

No description available

Usage

datasync grass-gis register-layers [OPTIONS] PARQUET_FILE_PATH

Arguments

Name	Description	Required
`PARQUET_FILE_PATH`		Yes
`PROJECT_NUMBER`		Yes
`GISBASE`		Yes

Options

No options available

services

Miljødata Infrastructure as Code pipelines

Usage

datasync services [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

services-to-parquet

Convert metadata.yml definitions to a set of parquet that can be imported in the DMS

Usage

datasync services services-to-parquet [OPTIONS]

Arguments

No arguments available

Options

Name	Required	Default
`--org`	No	`ninanor`
`--repo`	No	-
`--bucket`	No	-
`--endpoint`	No	-
`--access-key`	No	-
`--secret-key`	No	-
`--prefix`	No	`/dms/tables`
`--git-username`	No	-
`--git-token`	No	-

dashboard

Produce a Homer Dashbord using the Miljødata Infrastructure as Code repository as data source

Usage

datasync services dashboard [OPTIONS]

Arguments

No arguments available

Options

Name	Required	Default
`--org`	No	`ninanor`
`--repo`	No	-
`--config-org`	No	`ninanor`
`--config-repo`	No	-
`--bucket`	No	-
`--endpoint`	No	-
`--access-key`	No	-
`--secret-key`	No	-
`--git-username`	No	-
`--git-token`	No	-
`--prefix`	No	`/dms/services`

gbif-backbone

export GBIF Backbone data to DuckDB database

Usage

datasync gbif-backbone [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

import-all

Import GBIF Backbone data into a DuckDB database.

Usage

datasync gbif-backbone import-all [OPTIONS]

Arguments

No arguments available

Options

No options available

ipt

Provide commands to deal with IPT

Usage

datasync ipt [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

No options available

Subcommands

run

Convert IPT resources to geoparquet, register them in the DMS, publish metadata and configurations

Usage

datasync ipt run [OPTIONS]

Arguments

No arguments available

Options

Name	Description	Required	Default
`--skip-data / --no-skip-data`	Ignore data conversion step, perform only metadata	No	`no-skip-data`
`--skip-dms / --no-skip-dms`	Skip publishing to DMS	No	`no-skip-dms`
`--skip-csw / --no-skip-csw`	Skip publishing to CSW	No	`no-skip-csw`
`--skip-geoapi / --no-skip-geoapi`	Skip publishing to pygeoapi	No	`no-skip-geoapi`
`--limit`	Only import a certain amount of records	No	-
`--search`	execute only on resources which contains that string	No	-

validate-iso

Validate an XML file against ISO 19115 schema

Usage

datasync ipt validate-iso [OPTIONS] FILE

Arguments

Name	Description	Required
`FILE`		Yes

Options

No options available