The function predict
makes a prediction for new data based wither on a bag of models or
on its formula, coefficients, and weights. The prediction can be made either for a complete new dataset
with all the variables included in the formula or to predict the specific response on one single or a
group of variables in the model. In this case, all the other variables are set to their median or mean
value, to to zero (defined by the baseline
parameter). What controls that is which columns are added in
the newdata
data.frame.
Usage
predict(
x,
newdata,
type = c("linear", "exponential", "exp", "logit", "cloglog")[1],
wmean = TRUE,
wq_probs = NULL,
include = "all",
...
)
# S3 method for class 'bag'
predict(
x,
newdata,
data = NULL,
type = c("linear", "exponential", "exp", "logit", "cloglog")[1],
wmean = TRUE,
wq_probs = NULL,
include = "all",
baseline = c("median", "mean", "zero")[1],
zoi = FALSE,
zoi_shape = c("exp_decay", "gaussian_decay", "linear_decay", "threshold_decay")[1],
which_cumulative = "cumulative",
type_feature = c("point", "line", "area")[1],
n_features = 1,
resolution = 100,
line_value = 1,
...
)
# S3 method for class 'formula'
predict(
x,
newdata,
coefs,
weights = 1,
type = c("linear", "exponential", "exp", "logit", "cloglog")[1],
wmean = TRUE,
wq_probs = NULL,
include = "all",
...
)
Arguments
- x
[bag,list or formula]
A bag of models, resulting from a call tobag_models()
, or aformula
used to fit the models in the bag.- newdata
New data set to be used for prediction. It can include all the variables in the formula or only those for which the user is interested in making a prediction from.- type
[character="linear"]{"linear", "exponential", "exp", "logit", "cloglog"}
Type of prediction. One of"linear"
(default),"exp"
or"exponential"
,"logit"
, or"cloglog"
.- wmean
[logical=TRUE]
Should the weighted mean values be predicted? Default isTRUE
.- wq_probs
[vector,numeric(3)=c(0.025, 0.5, 0.975)]
A three element vector with lower, mid, and higher weighted quantiles to be computed.- include
[character="all"]
String of vector of strings with the terms (or unique parts of terms) to be predicted for. This does not restrict which terms we are focusing on - this is done by the definition of thenewdata
dataset and by which columns are in there. What theinclude
parameters does is to set which other variables will be used for prediction, at their mean or median values, for instance.- ...
Additional parameters. None implemented.- baseline
[character="median"]{"median", "mean", "zero")}
What values to choose for the baseline, i.e., for all other variables/terms not contained innewdata
. It can be one ofmedian
,"mean"
, or"zero"
.- zoi
[logical(1)=FALSE]
Are the columns innewdata
supposed to represent zones of influence (ZOI) variables? This parameter should be set toTRUE
if you provided a set of distances from a source that need to be translated into ZOI variables (cumulative or nearest ZOI from sources).- zoi_shape
[character="exp_decay"]{"exp_decay", "gaussian_decay", "linear_decay", "threshold_decay"}
Shape of the zone of influence (ZOI), ifzoi = TRUE
. Default isexp_decay"
. It can assume any of the possible values for the argumenttype
in the functiondist_decay()
.- which_cumulative
[character="cumulative"]
Which string or pattern to be searched on the column names ofnewdata
and on the original data used to fit the models to represent the cumulative ZOI. It is used to break the names of the columns/terms in the formula and get the ZOI radii as numbers, to be able to create all the ZOI radii included in the model or bag of models.- type_feature
[character="point"]{"point", "line", "area"}
Type of feature we are predicting for, for zone of influence-type variables. Default is"point"
. Iftype_feature = "line"
, a line is simulated with the functioncreate_linear_feature_zoi()
to get the values and account for the number of pixels of each single linear feature in the neighborhhod and correclty estimate the effect of each linear feature ZOI. The option"area"
is still not implemented and for now is treated as a point feature at the origin.- n_features
[numeric(1)=1]
Number of features to be used for prediction, for ZOI variables. Default is 1.- resolution
[numeric(1)=100]
Resolution for the raster created increate_line_feature_zoi()
, whentype_feature = "line"
.- line_value
[numeric(1)=1]
Value set to the raster line created bycreate_line_feature_zoi()
, whentype_feature = "line"
. It could be changed to different values if we want to represent e.g. the value in the linear feature as the roads traffic or another value for spatio-temporally dynamic variables.- coefs
[vector,numeric]
Either a named vector of coefficients (in case there is only one model) or a matrix of coefficients, with rownames as the term names and columns as the different models/resamples. Only relevant ifx
is a formula.- weights
[vector,numeric=1]
Vector of weights for the different models/resamples, i.e. the column from thecoefs
object with coefficients. A single number (by default, 1) in case there is only one model (coefs
is a vector). Only relevant ifx
is a formula.
See also
plot_response()
, create_line_feature_zoi()
.
Examples
#---
# fit a bag to be tested
# load packages
library(glmnet)
# load data
data("reindeer_rsf")
# rename it just for convenience
dat <- reindeer_rsf
# formula initial structure
f <- use ~ private_cabins_XXX + public_cabins_high_XXX +
NORUTreclass +
# poly(norway_pca_klima_axis1, 2, raw = TRUE) +
# poly(norway_pca_klima_axis2, 2, raw = TRUE) +
norway_pca_klima_axis1 + norway_pca_klima_axis1_sq +
norway_pca_klima_axis2 + norway_pca_klima_axis2_sq +
norway_pca_klima_axis3 + norway_pca_klima_axis4
# add ZOI terms to the formula
zois <- c(100, 250, 500, 1000, 2500, 5000, 10000, 20000)
f <- add_zoi_formula(f, zoi_radius = zois, pattern = "XXX",
type = c("cumulative_exp_decay"),
separator = "_", predictor_table = TRUE)$formula
# sampling - random sampling
set.seed(1234)
samples <- create_resamples(y = dat$use,
p = c(0.2, 0.2, 0.2),
times = 10,
colH0 = NULL)
#> [1] "Starting random sampling..."
# fit multiple models
fittedl <- bag_fit_net_logit(f,
data = dat,
samples = samples,
standardize = "internal", # glmnet does the standardization of covariates
metric = "AUC",
method = "AdaptiveLasso",
parallel = "mclapply",
mc.cores = 2)
# bag models in a single object
bag_object <- bag_models(fittedl, dat, score_threshold = 0.7)
#---
# prediction using formula
# new data, looking only at PCA1
dfvar = data.frame(norway_pca_klima_axis1 = seq(min(bag_object$data_summary$norway_pca_klima_axis1),
max(bag_object$data_summary$norway_pca_klima_axis1),
length.out = 100))
dfvar$norway_pca_klima_axis1_sq = dfvar$norway_pca_klima_axis1**2
# one model only
bag_predict(x = f,
newdata = dfvar,
coefs = bag_object$coef[,1],
include = "axis1")
#> Error in bag_predict(x = f, newdata = dfvar, coefs = bag_object$coef[, 1], include = "axis1"): could not find function "bag_predict"
# whole bag, weighted mean - here all weights = 1
bag_predict(x = f,
newdata = dfvar,
coefs = bag_object$coef,
include = names(dfvar))
#> Error in bag_predict(x = f, newdata = dfvar, coefs = bag_object$coef, include = names(dfvar)): could not find function "bag_predict"
# whole bag, for each model separately
bag_predict(x = f,
newdata = dfvar,
coefs = bag_object$coef,
wmean = FALSE,
include = names(dfvar))
#> Error in bag_predict(x = f, newdata = dfvar, coefs = bag_object$coef, wmean = FALSE, include = names(dfvar)): could not find function "bag_predict"
#---
# prediction using bag
# prediction for the very same dataset, linear scale
bag_predict(x = bag_object,
newdata = dat,
data = dat)
#> Error in bag_predict(x = bag_object, newdata = dat, data = dat): could not find function "bag_predict"
# non ZOI variable
# new data, looking only at PCA3
dfvar = data.frame(norway_pca_klima_axis3 = seq(min(bag_object$data_summary$norway_pca_klima_axis3),
max(bag_object$data_summary$norway_pca_klima_axis3),
length.out = 100))
bag_predict(x = bag_object,
newdata = dfvar,
data = dat)
#> Error in bag_predict(x = bag_object, newdata = dfvar, data = dat): could not find function "bag_predict"
# ZOI variable
# new data, looking only at private cabins
dfvar = data.frame(private_cabins = 1e3*seq(0.2, 20, length.out = 100))
# prediction for 1 feature, linear scale
bag_predict(x = bag_object,
newdata = dfvar,
data = dat,
zoi = TRUE,
baseline = "zero")
#> Error in bag_predict(x = bag_object, newdata = dfvar, data = dat, zoi = TRUE, baseline = "zero"): could not find function "bag_predict"
# prediction for 30 features, exp scale, with weighted confidence intervals
bag_predict(x = bag_object,
newdata = dfvar,
data = dat,
type = "exp",
wq_probs = c(0.025, 0.975),
zoi = TRUE,
n_features = 30,
baseline = "zero")
#> Error in bag_predict(x = bag_object, newdata = dfvar, data = dat, type = "exp", wq_probs = c(0.025, 0.975), zoi = TRUE, n_features = 30, baseline = "zero"): could not find function "bag_predict"
# plot
plot(dfvar[,1],
bag_predict(x = bag_object,
newdata = dfvar,
data = dat,
type = "exp",
zoi = TRUE,
n_features = 30,
baseline = "zero")[,1])
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'plot': could not find function "bag_predict"