Data Sync Scripts
datasync
Provide subcommands for synchronizing different resources, see subcommands
Usage
datasync [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--install-completion |
Install completion for the current shell. | No | - |
--show-completion |
Show completion for the current shell, to copy it or customize the installation. | No | - |
Commands
| Name | Description |
|---|---|
nva |
Commands to handle NVA tasks |
ubw |
Export UBW APIs to Parquet in S3 bucket |
dms |
|
ninagen |
Commands to handle NINAGEN tasks |
pit-registering-salmon |
|
grass-gis |
|
services |
Miljødata Infrastructure as Code pipelines |
gbif-backbone |
export GBIF Backbone data to DuckDB database |
ipt |
Provide commands to deal with IPT |
Subcommands
nva
Commands to handle NVA tasks
Usage
datasync nva [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
run
Sync NVA data from REST API to target
Usage
datasync nva run [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--resources / --no-resources |
No | no-resources |
|
--projects / --no-projects |
No | no-projects |
|
--persons / --no-persons |
No | no-persons |
|
--categories / --no-categories |
No | no-categories |
|
--funding-sources / --no-funding-sources |
No | no-funding-sources |
|
--base-url |
No | https://api.nva.unit.no/ |
|
--duckdb-name |
No | nva_sync |
|
--institution-code |
No | 7511.0.0.0 |
|
--endpoint-url |
No | - | |
--access-key |
No | - | |
--secret-key |
No | - | |
--bucket |
No | - | |
--prefix |
No | nva |
|
--region |
No | us-east-1 |
filter-data
Filter and create the tables from NVA data to parquet
Usage
datasync nva filter-data [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--data-s3-path |
No | - | |
--storage-s3-path |
No | - | |
--storage-access-key |
No | - | |
--storage-secret-key |
No | - | |
--storage-bucket |
No | - | |
--storage-prefix |
No | nva-filtered |
|
--storage-url-style |
No | path |
ubw
Export UBW APIs to Parquet in S3 bucket
Usage
datasync ubw [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
run
No description available
Usage
datasync ubw run [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--access-key |
No | - | |
--secret-key |
No | - | |
--endpoint-url |
No | - | |
--bucket |
No | - | |
--prefix |
No | - | |
--base-url |
No | - | |
--auth |
No | - |
dms
No description available
Usage
datasync dms [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
generate-csw-metadata
No description available
Usage
datasync dms generate-csw-metadata [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--base-url |
No | - | |
--access-key |
No | - | |
--secret-key |
No | - | |
--endpoint |
No | - | |
--bucket |
No | - | |
--publish-url |
No | - | |
--limit |
No | - | |
--search |
Filter resources by title using a LIKE expression | No | - |
generate-geoapi-config
No description available
Usage
datasync dms generate-geoapi-config [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--base-url |
No | - | |
--publish-url |
No | - | |
--search |
Filter resources by title using a LIKE expression | No | - |
generate-maps-json
Generate a maps.json file from map resources in the DMS parquet files. The output follows the format used by the NINA map-editor. The URL for each map is read from the uri field of the resource. The file is written to S3 as a publicly accessible file.
Usage
datasync dms generate-maps-json [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--base-url |
No | - | |
--access-key |
No | - | |
--secret-key |
No | - | |
--endpoint |
No | - | |
--where |
Provide an additional SQL filter | No | 1=1 |
--bucket |
S3 bucket for output (e.g., 'my-bucket') | No | - |
--output |
S3 key path for output JSON file | No | /dms/maps/maps.json |
ninagen
Commands to handle NINAGEN tasks
Usage
datasync ninagen [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
snp-database-normalize
Convert SNP excel sheet to parquet
Usage
datasync ninagen snp-database-normalize [OPTIONS] FILE [SHEET]
Arguments
| Name | Description | Required |
|---|---|---|
FILE |
path to the file | Yes |
SHEET |
Name of the Excel Sheet to use | No |
Options
| Name | Description | Required | Default |
|---|---|---|---|
--header-row |
XLSX Row number that contains the header | No | 1 |
--allele-start-column |
XLSX column that contains the first allele | No | F |
snp-analysis-to-parquet
Convert SNP csv of an analysis to a parquet file
Usage
datasync ninagen snp-analysis-to-parquet [OPTIONS] FILE
Arguments
| Name | Description | Required |
|---|---|---|
FILE |
Path to the csv file | Yes |
Options
No options available
pit-registering-salmon
No description available
Usage
datasync pit-registering-salmon [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
run
Download PIT data from BioMark's API to a .duckdb file.
Usage
datasync pit-registering-salmon run [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--duckdb-path |
No | biomark_pit_registering_salmon_v1.duckdb |
|
--place |
Site location (kongsfjord, sylte, vigda, agdenes, vatne) | No | - |
--begin-date |
Start date for data download in YYYY-MM-DD format | No | - |
--end-date |
End date for data download in YYYY-MM-DD format | No | - |
--tags / --no-tags |
Download tags data | No | no-tags |
--readers / --no-readers |
Download readers voltage data | No | no-readers |
--environment / --no-environment |
Download environment data | No | no-environment |
--all-locations / --no-all-locations |
Download data from all accessible locations | No | no-all-locations |
--base-url |
No | https://data3.biomark.com/api/v1/ |
|
--yesterday / --no-yesterday |
Set date range to yesterday only | No | no-yesterday |
--dataset-name |
No | main |
replicate
Upload data from .duckdb to S3 bucket.
Usage
datasync pit-registering-salmon replicate [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--bucket |
No | - | |
--endpoint-url |
No | - | |
--access-key |
No | - | |
--secret-key |
No | - | |
--duckdb-path |
No | biomark_pit_registering_salmon_v1.duckdb |
|
--region |
No | us-east-1 |
|
--dataset-name |
No | main |
|
--tags / --no-tags |
Add tags data to S3 | No | no-tags |
--readers / --no-readers |
Add readers voltage data to S3 | No | no-readers |
--environment / --no-environment |
Add environment data to S3 | No | no-environment |
grass-gis
No description available
Usage
datasync grass-gis [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
register-layers
No description available
Usage
datasync grass-gis register-layers [OPTIONS] PARQUET_FILE_PATH
Arguments
| Name | Description | Required |
|---|---|---|
PARQUET_FILE_PATH |
Yes | |
PROJECT_NUMBER |
Yes | |
GISBASE |
Yes |
Options
No options available
services
Miljødata Infrastructure as Code pipelines
Usage
datasync services [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
services-to-parquet
Convert metadata.yml definitions to a set of parquet that can be imported in the DMS
Usage
datasync services services-to-parquet [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--org |
No | ninanor |
|
--repo |
No | - | |
--bucket |
No | - | |
--endpoint |
No | - | |
--access-key |
No | - | |
--secret-key |
No | - | |
--prefix |
No | /dms/tables |
|
--git-username |
No | - | |
--git-token |
No | - |
dashboard
Produce a Homer Dashbord using the Miljødata Infrastructure as Code repository as data source
Usage
datasync services dashboard [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--org |
No | ninanor |
|
--repo |
No | - | |
--config-org |
No | ninanor |
|
--config-repo |
No | - | |
--bucket |
No | - | |
--endpoint |
No | - | |
--access-key |
No | - | |
--secret-key |
No | - | |
--git-username |
No | - | |
--git-token |
No | - | |
--prefix |
No | /dms/services |
gbif-backbone
export GBIF Backbone data to DuckDB database
Usage
datasync gbif-backbone [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
import-all
Import GBIF Backbone data into a DuckDB database.
Usage
datasync gbif-backbone import-all [OPTIONS]
Arguments
No arguments available
Options
No options available
ipt
Provide commands to deal with IPT
Usage
datasync ipt [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
No options available
Subcommands
run
Convert IPT resources to geoparquet, register them in the DMS, publish metadata and configurations
Usage
datasync ipt run [OPTIONS]
Arguments
No arguments available
Options
| Name | Description | Required | Default |
|---|---|---|---|
--skip-data / --no-skip-data |
Ignore data conversion step, perform only metadata | No | no-skip-data |
--skip-dms / --no-skip-dms |
Skip publishing to DMS | No | no-skip-dms |
--skip-csw / --no-skip-csw |
Skip publishing to CSW | No | no-skip-csw |
--skip-geoapi / --no-skip-geoapi |
Skip publishing to pygeoapi | No | no-skip-geoapi |
--limit |
Only import a certain amount of records | No | - |
--search |
execute only on resources which contains that string | No | - |
validate-iso
Validate an XML file against ISO 19115 schema
Usage
datasync ipt validate-iso [OPTIONS] FILE
Arguments
| Name | Description | Required |
|---|---|---|
FILE |
Yes |
Options
No options available