11 Results collation for archiving

In the previous chapters, we have saved the data resulting from simulations and index calculations for each indicator and ecosystem locally.

The new Nature Index webpage (old version at https://www.naturindeks.no; to be updated soon) will present much of this data graphically. As the webpage reads information directly from the NI database (https://naturindeks.nina.no), the results therefore need to be fed back to the NI database.

For the previous iteration (NI2020), this upload has been handled manually. In 2021-2022, routines for feeding back calculated index data were partially coded by Bård Pedersen. His code is archived on NINA’s internal drive under the path:

P:/412430_naturindeks/DataUpload_Code_from_BårdPedersen

This code is not completely documented, has not been thoroughly tested, and not been included into NIcalc (yet). It is also only partially applicable for NI2025 because the new NI webpage will present additional information (raw and scaled indicator data including uncertainty, index uncertainty on maps) and altered content (new approach to indicator indices, new options for toggling on/off certain parts of information). For these reasons, we will not be using/working from Bård Pedersen’s code, but instead collate data a) for all indicators and b) for all ecosystem indices and upload new tables manually to the NI database.

The workflow below contains the collation of indicator and ecosystem index data into joint tables that are then saved as .csv and handed to the database administrator for manual archiving.

Before proceeding, we do open a connection to the NI database. We will need that to pull information on indicator data type.

NIdb_username <- Sys.getenv("NIdb_username")
NIdb_password <- Sys.getenv("NIdb_password")

NIcalc::getToken(username = NIdb_username,
                 password = NIdb_password)

11.1 Indicator (index) data

We start by retrieving the indicator list, then cycle through all indicators and collate the statistical summaries of:

Simulated raw indicator data
Simulated scaled indicator data
Calculated indicator index data

Before we do that, we have to determine which version of the scaled data we want to use (with respect to whether reference value uncertainty is to be accounted for, and at what point in calculations truncation does or does not happen.) This is based on the two toggles uncertainRefValue and truncAfterSummary, and these should be set consistent with use in the first chapter:

## Toggle for using/ignoring reference value uncertainty in visualizations
uncertainRefValue <- FALSE

## Toggle for visualizing scaled values truncated before or after summarising
truncAfterSummary <- TRUE

## Select correct file for scaled data
filename_scaled <- dplyr::case_when(
  !uncertainRefValue & truncAfterSummary ~ "fixedRef",
  !uncertainRefValue & !truncAfterSummary ~ "fixedRef_truncPreSum",
  uncertainRefValue & truncAfterSummary ~ "sampledRef",
  uncertainRefValue & !truncAfterSummary ~ "sampledRef_truncPreSum",
)

Now we are ready to retrieve data:

## Read indicator list
indicatorList <- readRDS("data/NI2025_IndicatorList.rds")

## Set up empty data frames for collation
IndData_raw <- data.frame()
IndData_scaled <- data.frame()
IndIndexData <- data.frame()

## Read and collate data for each indicator
for(i in 1:nrow(indicatorList)){
  
  # Set name and id
  indName <- indicatorList$name[i]
  indId <- indicatorList$id[i]
  
  # Load and collate raw indicator data
  IndData_raw_i <- readRDS(paste0("data/IndicatorData_StatSummaries/statSummary_id_", indId, "_raw.rds")) %>%
    dplyr::mutate(id = indId,
                  name = indName,
                  .before = 1)
  IndData_raw <- IndData_raw %>%
    dplyr::bind_rows(IndData_raw_i)
  
  # Load and collate scaled indicator data
    IndData_scaled_i <- readRDS(paste0("data/IndicatorData_StatSummaries/statSummary_id_", indId, "_scaled_", filename_scaled,  ".rds")) %>%
    dplyr::mutate(id = indId,
                  name = indName,
                  .before = 1)
  IndData_scaled <- IndData_scaled %>%
    dplyr::bind_rows(IndData_scaled_i)
  
  # Load and collate indicator index data
  IndIndexData_i <- readRDS(paste0("results_Indicators/indIndex_id_", indId, ".rds"))$summary %>%
    dplyr::filter(indexArea == "wholeArea", !data_singleArea) %>%
    dplyr::mutate(id = indId,
                  name = indName,
                  .before = 1)
  IndIndexData <- IndIndexData %>%
    dplyr::bind_rows(IndIndexData_i)
}

For indicator index data, we additionally want to reduce the data to the representative years only. Even though some indicators have data for years prior to the first NI year ever used (1990) and for years in between NI update years, that data is often only available for a non-representative sample of areas. As a consequence, calculated indicator indices for those years may look very different than the ones for the focal years. To avoid unnecessary confusion, we’ll therefore only present values for the NI focal years (including 1990, if available), as well as the 1995 and 2005 if data is available, on the webpage. Below, we subset the data to the relevant years:

ind_focalYears <- c(1990, 1995, 2000, 2005, 2010, 2014, 2019, 2024)

IndIndexData <- IndIndexData %>%
  dplyr::filter(year_t %in% ind_focalYears)

Finally, we save the collated data as CSV for incorporation in the NI database:

## Make directory
if(!dir.exists("data_forUpload")){
 dir.create("data_forUpload")
}

## Convert numbers to character to avoid scientific notation in CSV
options(scipen = 999)

IndData_raw <- IndData_raw %>%
  dplyr::mutate(across(c(mean, median, sd, rel_sd, q025, q05, q25, q75, q95, q975), ~ as.character(round(.x, digits = 5))))

IndData_scaled <- IndData_scaled %>%
  dplyr::mutate(across(c(mean, median, sd, rel_sd, q025, q05, q25, q75, q95, q975), ~ as.character(round(.x, digits = 5))))

IndIndexData <- IndIndexData %>%
  dplyr::mutate(across(c(q025, median, q975, displacement), ~ as.character(round(.x, digits = 5))))

## Write to CSV
readr::write_excel_csv(IndData_raw, file = "data_forUpload/IndicatorData_raw.csv")
readr::write_excel_csv(IndData_scaled, file = "data_forUpload/IndicatorData_scaled.csv")
readr::write_excel_csv(IndIndexData, file = "data_forUpload/IndicatorIndexData.csv")

11.2 Nature index data

For the calculated NI data, we list ecosystems, collate statistical summaries and data compositions for all, and consequently save as CSV for incorporation in the NI database. Information on indicator data source (type) is not retained in the outputs of NIcalc, so this information we will have to pull from the database again.

## List ecosystems
ecosystems <- c("Forest", "Mountain", "Wetlands", "OpenLowland", 
                "Freshwater", "Coast", "Ocean")

## Set up empty data frames for collation 
indexData <- data.frame()
propDataSource <- data.frame()
propFunGroup <- data.frame()

## Read and collate data for each ecosystem
for(i in 1:length(ecosystems)){
  
  # Statistical summaries of indicator values
  indexData_i <- readr::read_csv(paste0("results_", ecosystems[i], "/indexStatSummary.csv")) %>%
    dplyr::mutate(Ecosystem = ecosystems[i],
                  .before = 1)
  
  indexData <- indexData %>%
    dplyr::bind_rows(indexData_i)
  
  # Data source proportions
  NIobj <- readRDS(paste0("data/NI2025_Data_", ecosystems[i], ".rds"))
  
  n_imputes <- nrow(NIobj$NAImputes$identifiers)
  
  usedData <- NIobj$importData$indicatorObservations$indicatorValues %>% 
    dplyr::filter(!is.na(expectedValue)) %>%
    dplyr::select(indId, ICunitId, yearId)
  
  indData_all <- data.frame()
  indList <- unique(usedData$indId)
  
  for(j in 1:length(indList)){
    indValues_j <- NIcalc::getIndicatorValues(indicatorID = indList[j])$indicatorValues %>%
      dplyr::filter(yearName != "Referanseverdi" & !is.na(verdi)) %>%
      dplyr::rename(indId = indicatorId, 
                    ICunitId = areaId) %>%
      dplyr::select(indId, ICunitId, yearId, datatypeName)
    indData_all <- indData_all %>%
      dplyr::bind_rows(indValues_j)
  }

  propDataSource_i <- usedData %>%
    dplyr::left_join(indData_all, by = c("indId", "ICunitId", "yearId")) %>%
    dplyr::group_by(datatypeName) %>%
    dplyr::summarise(number = n()) %>%
    dplyr::mutate(datatypeName = ifelse(datatypeName == "Beregnet fra modeller", "Modellberegning", datatypeName)) %>%
    dplyr::bind_rows(data.frame(datatypeName = "Imputering", number = n_imputes)) %>%
    dplyr::mutate(proportion = number/sum(number),
                  ecosystem = ecosystems[i])
    
  propDataSource <- propDataSource %>%
    dplyr::bind_rows(propDataSource_i)
    
  
  # Trophic group proportions
  propFunGroup_i <- NIobj$InputData$indicators %>%
    dplyr::select(functionalGroup) %>%
    dplyr::group_by(functionalGroup) %>%
    dplyr::summarise(number = n()) %>%
    dplyr::mutate(proportion = number/nrow(NIobj$InputData$indicators),
                  ecosystem = ecosystems[i])
  
  propFunGroup <- propFunGroup %>%
    dplyr::bind_rows(propFunGroup_i)
}

## Add proportions across all ecosystems
propDataSource_all <- propDataSource %>%
  dplyr::group_by(datatypeName) %>%
  dplyr::summarise(number = sum(number)) %>%
  dplyr::mutate(proportion = number/sum(number),
                ecosystem = "Overall")
propDataSource <- propDataSource %>%
  dplyr::bind_rows(propDataSource_all)

propFunGroup_all <- propFunGroup %>%
  dplyr::group_by(functionalGroup) %>%
  dplyr::summarise(number = sum(number)) %>%
  dplyr::mutate(proportion = number/sum(number),
                ecosystem = "Overall")
propFunGroup <- propFunGroup %>%
  dplyr::bind_rows(propFunGroup_all)

## Convert numbers to character to avoid scientific notation in CSV
indexData <- indexData  %>%
  dplyr::mutate(across(c(mean, median, sd, rel_sd, q025, q05, q25, q75, q95, q975), ~ as.character(round(.x, digits = 5))))

## Write to CSV
readr::write_excel_csv(indexData, file = "data_forUpload/NatureIndex_allEcosystems.csv")
readr::write_excel_csv(propDataSource, file = "data_forUpload/DataSource_Proportions.csv")
readr::write_excel_csv(propFunGroup, file = "data_forUpload/FunctionalGroup_Proportions.csv")