Title: | Simulate Outcomes Using Spatially Dependent Design Matrices |
---|---|
Description: | Provides tools for simulating spatially dependent predictors (continuous or binary), which are used to generate scalar outcomes in a (generalized) linear model framework. Continuous predictors are generated using traditional multivariate normal distributions or Gauss Markov random fields with several correlation function approaches (e.g., see Rue (2001) <doi:10.1111/1467-9868.00288> and Furrer and Sain (2010) <doi:10.18637/jss.v036.i10>), while binary predictors are generated using a Boolean model (see Cressie and Wikle (2011, ISBN: 978-0-471-69274-4)). Parameter vectors exhibiting spatial clustering can also be easily specified by the user. |
Authors: | Justin Leach [aut, cre, cph] |
Maintainer: | Justin Leach <[email protected]> |
License: | GPL-3 |
Version: | 0.1.1 |
Built: | 2025-02-23 04:01:54 UTC |
Source: | https://github.com/jmleach-bst/sim2dpredictr |
Specify the locations in the lattice/image that have non-zero parameters as well as the values for those parameters, and the function creates the parameter vector that matches the correct locations in the design matrix.
beta_builder( row.index, col.index, im.res, B0 = 0, B.values, index.type = "manual", decay.fn = "gaussian", phi = 0.5, max.d = Inf, h, w, bayesian = FALSE, bayesian.dist = NULL, bayesian.scale = NULL, output.indices = TRUE )
beta_builder( row.index, col.index, im.res, B0 = 0, B.values, index.type = "manual", decay.fn = "gaussian", phi = 0.5, max.d = Inf, h, w, bayesian = FALSE, bayesian.dist = NULL, bayesian.scale = NULL, output.indices = TRUE )
row.index , col.index
|
Vectors of row/columns indices for non-zero
parameters.
If |
im.res |
A vector specifying the dimension/resolution of the image. The first entry is the number of 'rows' in the lattice/image, and the second entry is the number of 'columns' in the lattice/image. |
B0 |
is the "true" intercept value; default is 0. |
B.values |
is a vector "true" parameter values for non-zero parameters.
The order of assignment is by row. If B.values argument is a single value,
then all non-zero parameters are assigned to that value, unless
|
index.type |
is one of index.type = c("manual", "rectangle", "ellipse", "decay")
|
decay.fn |
An argument to specify the decay function of non-zero
parameters decay from the peak when |
phi |
A scalar value greater than 0 that determines the decay rate of
non-zero parameters when |
max.d |
When |
w , h
|
If index.type = "ellipse" then the width and height of the ellipse, respectively. |
bayesian |
If |
bayesian.dist |
When |
bayesian.scale |
A list. When |
output.indices |
If |
A 2-element list containing (1) indices for the locations of "true" non-zero parameters, and (2) a parameter vector.
The order of the parameters is by row. That is, if the lattice/image is 4x4, then parameters 1-4 make up the first row, 5-8 then second, and so forth.
## example Bex1 <- beta_builder(row.index = c(5, 5, 6, 6), col.index = c(5, 6, 5, 6), im.res = c(10, 10), B0 = 0, B.values = rep(1, 4)) ## True non-zero parameters are locations 45, 46, 55, 56 in B ## i.e. locations (5, 5), (5, 6), (6, 5), (6, 6) ## Suppose that we index rows by i = 1, ... , I ## cols by j = 1, ... , J ## The index for a parameter is given by J * (i - 1) + j ## In this example, I = 10, J = 10; Thus: ## (5, 5) -> 10 * (5 - 1) + 5 = 45 ## (5, 6) -> 10 * (5 - 1) + 6 = 46 ## (6, 5) -> 10 * (6 - 1) + 5 = 55 ## (6, 6) -> 10 * (6 - 1) + 6 = 45 Bex1 ## length 101 (includes B0 w/ 100 variable parameter values) length(Bex1$B) ## example: index.type = "rectangle" Bex2 <- beta_builder(row.index = 12:15, col.index = 6:19, im.res = c(20, 20), B0 = 16, B.values = 1:(length(12:15) * length(6:19)), index.type = "rectangle") Bex2 matrix(Bex2$B[-1], nrow = 20, byrow = TRUE) ## example: index.type = "ellipse" Bex3 <- beta_builder(row.index = 4, col.index = 5, im.res = c(10, 10), B0 = 16, B.values = 3, index.type = "ellipse", h = 5, w = 4) Bex3 matrix(Bex3$B[-1], nrow = 10, byrow = TRUE) ## decaying parameter values Bex4 <- beta_builder(row.index = 10, col.index = 20, im.res = c(30, 30), B0 = 0, B.values = 10, index.type = "decay", max.d = 7, output.indices = FALSE) inf_2D_image(B = Bex4, im.res = c(30, 30), binarize.B = FALSE) Bex5 <- beta_builder(row.index = 4, col.index = 5, im.res = c(10, 10), B0 = 16, B.values = 5, index.type = "ellipse", h = 5, w = 4, bayesian = TRUE, bayesian.dist = "gaussian", bayesian.scale = list("binary", c(0, 1, 0.25))) inf_2D_image(B = Bex5$B, im.res = c(10, 10), binarize.B = FALSE)
## example Bex1 <- beta_builder(row.index = c(5, 5, 6, 6), col.index = c(5, 6, 5, 6), im.res = c(10, 10), B0 = 0, B.values = rep(1, 4)) ## True non-zero parameters are locations 45, 46, 55, 56 in B ## i.e. locations (5, 5), (5, 6), (6, 5), (6, 6) ## Suppose that we index rows by i = 1, ... , I ## cols by j = 1, ... , J ## The index for a parameter is given by J * (i - 1) + j ## In this example, I = 10, J = 10; Thus: ## (5, 5) -> 10 * (5 - 1) + 5 = 45 ## (5, 6) -> 10 * (5 - 1) + 6 = 46 ## (6, 5) -> 10 * (6 - 1) + 5 = 55 ## (6, 6) -> 10 * (6 - 1) + 6 = 45 Bex1 ## length 101 (includes B0 w/ 100 variable parameter values) length(Bex1$B) ## example: index.type = "rectangle" Bex2 <- beta_builder(row.index = 12:15, col.index = 6:19, im.res = c(20, 20), B0 = 16, B.values = 1:(length(12:15) * length(6:19)), index.type = "rectangle") Bex2 matrix(Bex2$B[-1], nrow = 20, byrow = TRUE) ## example: index.type = "ellipse" Bex3 <- beta_builder(row.index = 4, col.index = 5, im.res = c(10, 10), B0 = 16, B.values = 3, index.type = "ellipse", h = 5, w = 4) Bex3 matrix(Bex3$B[-1], nrow = 10, byrow = TRUE) ## decaying parameter values Bex4 <- beta_builder(row.index = 10, col.index = 20, im.res = c(30, 30), B0 = 0, B.values = 10, index.type = "decay", max.d = 7, output.indices = FALSE) inf_2D_image(B = Bex4, im.res = c(30, 30), binarize.B = FALSE) Bex5 <- beta_builder(row.index = 4, col.index = 5, im.res = c(10, 10), B0 = 16, B.values = 5, index.type = "ellipse", h = 5, w = 4, bayesian = TRUE, bayesian.dist = "gaussian", bayesian.scale = list("binary", c(0, 1, 0.25))) inf_2D_image(B = Bex5$B, im.res = c(10, 10), binarize.B = FALSE)
The function first builds a correlation matrix with correlation.builder,
converts that matrix to a covariance matrix if necessary, and then takes
the Cholesky decomposition of the matrix using either base R or the R
package spam
. Note that spam
is particularly effective when
the matrix is sparse.
chol_s2Dp( matrix.type = "cov", im.res, use.spam = FALSE, corr.structure = "ar1", rho = NULL, phi = NULL, tau = 1, alpha = 0.75, corr.min = NULL, neighborhood = "none", w = NULL, h = NULL, r = NULL, print.R = FALSE, print.S = FALSE, print.Q = FALSE, sigma = 1, triangle = "upper", print.all = FALSE, round.d = FALSE, return.cov = TRUE, return.prec = TRUE )
chol_s2Dp( matrix.type = "cov", im.res, use.spam = FALSE, corr.structure = "ar1", rho = NULL, phi = NULL, tau = 1, alpha = 0.75, corr.min = NULL, neighborhood = "none", w = NULL, h = NULL, r = NULL, print.R = FALSE, print.S = FALSE, print.Q = FALSE, sigma = 1, triangle = "upper", print.all = FALSE, round.d = FALSE, return.cov = TRUE, return.prec = TRUE )
matrix.type |
Determines whether to build a covariance matrix,
|
im.res |
A vector defining the dimension of spatial data. The first entry is the number of rows and the second entry is the number of columns. |
use.spam |
If |
corr.structure |
One of |
rho |
This is the maximum possible correlation between locations i
and j. For all i,j |
phi |
A scalar value greater than 0 that determines the decay rate of
correlation. This argument is only utilized when |
tau |
A vector containing precision parameters. If of length 1, then
all precisions are assumed equal. Otherwise the length of |
alpha |
A scalar value between 0 and 1 that defines the strength of
correlations. Note that when |
corr.min |
Scalar value to specify the minimum non-zero correlation.
Any correlations below |
neighborhood |
Defines the neighborhood within which marginal
correlations are non-zero. The default is |
w , h
|
If |
r |
If |
print.R , print.S , print.Q
|
Logical. When |
sigma |
Specify the desired standard deviations; the default is 1, in
which case the Cholesky decomposition is of a correlation matrix. If
|
triangle |
Determine whether to output an upper
( |
print.all |
If |
round.d |
If |
return.cov , return.prec
|
Logical. When |
Matrix of dimension (n.row x n.col) x (n.row x n.col). If either
return.cov
or return.prec
is TRUE
, then returns a
list where the first element is the covariance or precision matrix, and the
second element is the Cholesky factor.
Banerjee S, Carlin BP, Gelfand AE (2015). Hierarchical Modeling and Analysis for Spatial Data, Second edition. Chapman & Hall/CRC, Boca Raton, Florida.
Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.
Rue H (2001). “Fast Sampling of Gaussian Markov Random Fields.” Journal of the Royal Statistical Society B, 63, 325-338. doi:10.1111/1467-9868.00288.
Furrer R, Sain SR (2010). “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields.” Journal of Statistical Software, 36(10), 1-25. https://www.jstatsoft.org/v36/i10/.
## Use R package spam for Cholesky decomposition R_spam <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec", use.spam = TRUE, neighborhood = "ar1", triangle = "upper") ## Use base R for Cholesky decomposition R_base <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.15, neighborhood = "round", r = 3, use.spam = FALSE) ## Specify standard deviations instead of default of sigma = 1. R_sd <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.15, neighborhood = "round", r = 3, sigma = runif(9, 1.1, 4)) ## Not run: ## Print options ON R_pr_on <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.15, sigma = 1:9, neighborhood = "round", r = 3, print.R = TRUE, print.S = TRUE) ## End(Not run)
## Use R package spam for Cholesky decomposition R_spam <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec", use.spam = TRUE, neighborhood = "ar1", triangle = "upper") ## Use base R for Cholesky decomposition R_base <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.15, neighborhood = "round", r = 3, use.spam = FALSE) ## Specify standard deviations instead of default of sigma = 1. R_sd <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.15, neighborhood = "round", r = 3, sigma = runif(9, 1.1, 4)) ## Not run: ## Print options ON R_pr_on <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.15, sigma = 1:9, neighborhood = "round", r = 3, print.R = TRUE, print.S = TRUE) ## End(Not run)
Classify subjects based on predicted probabilities for each class. The predicted probabilities can be input by the user or calculated within function using parameter estimates and design matrix for a multinomial regression model.
classify_multiclass( predicted.probs = NULL, category.names, keep.probs = TRUE, B = NULL, X = NULL, X.incl.X0 = FALSE )
classify_multiclass( predicted.probs = NULL, category.names, keep.probs = TRUE, B = NULL, X = NULL, X.incl.X0 = FALSE )
predicted.probs |
A matrix where the number of rows is equal to the
number of subjects and the number of columns equals the number of
categories. |
category.names |
A vector containing the names of each category. The
order of names should match the order of columns in |
keep.probs |
Logical. When |
B |
A list, each element of which contains a parameter vector. The
list should have length |
X |
A matrix, each row of which contains subject covariate/predictor values. |
X.incl.X0 |
Logical. When |
Classification for each subject is determined based on the category with highest predicted probability.
Depending on the option selected for keep.probs
, returns a
data frame or vector.
## number of categories vt <- 3 ## covariate values xt <- matrix(rnorm(10 * 2), ncol = 2, nrow = 10) ## list of parameter vectors (over-parameterized model) bu <- list(b1 = c(0, 0.25, 0.25), b2 = c(0, -0.25, -0.25), b3 = c(0, 0.25, -0.25)) ## subject specific probabilities for each category ## (over-parameterized model) prp <- generate_multinom_probs(V = vt, X = xt, B = bu) classify_multiclass(predicted.probs = prp, category.names = c("A", "B", "C")) ## generate predicted probabilities within function classify_multiclass(predicted.probs = NULL, category.names = c("A", "B", "C"), X = xt, B = bu)
## number of categories vt <- 3 ## covariate values xt <- matrix(rnorm(10 * 2), ncol = 2, nrow = 10) ## list of parameter vectors (over-parameterized model) bu <- list(b1 = c(0, 0.25, 0.25), b2 = c(0, -0.25, -0.25), b3 = c(0, 0.25, -0.25)) ## subject specific probabilities for each category ## (over-parameterized model) prp <- generate_multinom_probs(V = vt, X = xt, B = bu) classify_multiclass(predicted.probs = prp, category.names = c("A", "B", "C")) ## generate predicted probabilities within function classify_multiclass(predicted.probs = NULL, category.names = c("A", "B", "C"), X = xt, B = bu)
This is primarily for use within correlation builder, and may be altered/expanded to handle more complicated correlation functions if desired.
corr_fun( corr.structure, im.res, corr.min = NULL, rho = NULL, phi = NULL, neighborhood = "none", round.d = FALSE, w = NULL, h = NULL, r = NULL, i, j, k, v )
corr_fun( corr.structure, im.res, corr.min = NULL, rho = NULL, phi = NULL, neighborhood = "none", round.d = FALSE, w = NULL, h = NULL, r = NULL, i, j, k, v )
corr.structure |
One of |
im.res |
A vector defining the dimension of spatial data. The first entry is the number of rows and the second entry is the number of columns. |
corr.min |
Scalar value to specify the minimum non-zero correlation.
Any correlations below |
rho |
This is the maximum possible correlation between locations i and
j. For all i,j |
phi |
A scalar value greater than 0 that determines the decay of
correlation. This argument is only utilized when |
neighborhood |
Defines the neighborhood within which marginal
correlations are non-zero. The default is |
round.d |
If |
w , h
|
If |
r |
If |
i , j , k , v
|
These are the coordinates for the two locations. Location 1 has coordinates (i, j) and location 2 has coordinates (k, v). |
A single element vector containing the correlation between spatial locations with indices (i, j) and (k, v).
## examples corr_fun(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, neighborhood = "round", r = 6, i = 1, j = 2, k = 2, v = 3) corr_fun(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, neighborhood = "rectangle", w = 1, h = 1, i = 1, j = 2, k = 2, v = 3)
## examples corr_fun(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, neighborhood = "round", r = 6, i = 1, j = 2, k = 2, v = 3) corr_fun(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, neighborhood = "rectangle", w = 1, h = 1, i = 1, j = 2, k = 2, v = 3)
This function "builds" a correlation matrix based on user specifications.
correlation_builder( corr.structure = "ar1", im.res, corr.min = NULL, neighborhood = "none", rho = NULL, phi = NULL, w = NULL, h = NULL, r = NULL, print.all = FALSE, round.d = FALSE )
correlation_builder( corr.structure = "ar1", im.res, corr.min = NULL, neighborhood = "none", rho = NULL, phi = NULL, w = NULL, h = NULL, r = NULL, print.all = FALSE, round.d = FALSE )
corr.structure |
One of |
im.res |
A vector defining the dimension of spatial data. The first entry is the number of rows and the second entry is the number of columns. |
corr.min |
Scalar value to specify the minimum non-zero correlation.
Any correlations below |
neighborhood |
Defines the neighborhood within which marginal
correlations are non-zero. The default is |
rho |
This is the maximum possible correlation between locations i
and j. For all i,j |
phi |
A scalar value greater than 0 that determines the decay rate of
correlation. This argument is only utilized when |
w , h
|
If |
r |
If |
print.all |
If |
round.d |
If |
Returns correlation matrix.
Caution is recommended when using corr.min
or
neighborhood
to set many correlations to 0, as not all
specifications will result in a positive definite matrix. In particular,
sharp drop-offs tend to result in non-positive definite matrices.
## examples correlation_builder(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, neighborhood = "round", r = 6, print.all = TRUE) correlation_builder(corr.structure = "exponential", im.res = c(3, 3), phi = 0.5, neighborhood = "round", r = 3, print.all = TRUE) correlation_builder(corr.structure = "CS", im.res = c(3, 3), rho = 0.5, print.all = TRUE) ## no "true" zeros, but gets close c.nr <- correlation_builder(corr.structure = "ar1", neighborhood = "none", corr.min = NULL, im.res = c(15, 15), rho = 0.5) length(c.nr[c.nr > 0]) min(c.nr) ## set corr.min gives many zero entries; sparser structure c.r <- correlation_builder(corr.structure = "ar1", neighborhood = "none", corr.min = 0.01, im.res = c(15, 15), rho = 0.5) ## raw number > 0 length(c.r[c.r > 0]) ## proportion > 0 length(c.r[c.r > 0]) / length(c.nr)
## examples correlation_builder(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, neighborhood = "round", r = 6, print.all = TRUE) correlation_builder(corr.structure = "exponential", im.res = c(3, 3), phi = 0.5, neighborhood = "round", r = 3, print.all = TRUE) correlation_builder(corr.structure = "CS", im.res = c(3, 3), rho = 0.5, print.all = TRUE) ## no "true" zeros, but gets close c.nr <- correlation_builder(corr.structure = "ar1", neighborhood = "none", corr.min = NULL, im.res = c(15, 15), rho = 0.5) length(c.nr[c.nr > 0]) min(c.nr) ## set corr.min gives many zero entries; sparser structure c.r <- correlation_builder(corr.structure = "ar1", neighborhood = "none", corr.min = 0.01, im.res = c(15, 15), rho = 0.5) ## raw number > 0 length(c.r[c.r > 0]) ## proportion > 0 length(c.r[c.r > 0]) / length(c.nr)
Input the limits of a 2D space and the desired image resolution, then the function outputs the appropriate grid/lattice coordinates.
generate_grid(im.res, xlim = c(0, 1), ylim = c(0, 1))
generate_grid(im.res, xlim = c(0, 1), ylim = c(0, 1))
im.res |
A vector specifying the dimension/resolution of the image. The first entry is the number of 'rows' in the lattice/image, and the second entry is the number of columns' in the lattice/image. |
xlim , ylim
|
These are the 2D image limits. Defaults for both are
|
A data frame whose first column is x-coordinates and whose second column is y-coordinates.
Obtain probabilities for each category of a multinomial distribution based on covariate and parameter values based on the logit models for the multinomial distribution.
generate_multinom_probs(V = NULL, B = NULL, X = NULL, X.incl.X0 = FALSE)
generate_multinom_probs(V = NULL, B = NULL, X = NULL, X.incl.X0 = FALSE)
V |
A numeric value stating the number of categories desired. |
B |
A list, each element of which contains a parameter vector. The
list should have length |
X |
A matrix, each row of which contains subject covariate/predictor values. |
X.incl.X0 |
Logical. When |
A matrix containing subject-specific probabilities for each
category of the multinomial distribution. The number of rows equals
nrow(X)
and the number of columns equals V
.
Agresti A (2007). An Introduction to Categorical Analysis, 2nd edition. John Wiley & Sons, Hoboken, New Jersey.
Friedman J, Hastie T, Tibshirani R (2010). “Regularization paths for generalized linear models via coordinate descent.” Journal of Statistical Software, 33, 1-22. doi:10.18637/jss.v033.i01.
## number of categories vt <- 3 ## covariate values xt <- matrix(rnorm(10 * 2), ncol = 2, nrow = 10) ## list of parameter vectors bt <- list(b1 = c(1, 0.25, -0.25), b2 = c(-0.5, 0.15, 0.15)) ## list of parameter vectors (over-parameterized model) bu <- list(b1 = c(1, 0.25, -0.25), b2 = c(-0.5, 0.15, 0.15), b3 = c(-1, 0.1, -0.20)) ## subject specific probabilities for each category generate_multinom_probs(V = vt, X = xt, B = bt) ## subject specific probabilities for each category ## (over-parameterized model) generate_multinom_probs(V = vt, X = xt, B = bu)
## number of categories vt <- 3 ## covariate values xt <- matrix(rnorm(10 * 2), ncol = 2, nrow = 10) ## list of parameter vectors bt <- list(b1 = c(1, 0.25, -0.25), b2 = c(-0.5, 0.15, 0.15)) ## list of parameter vectors (over-parameterized model) bu <- list(b1 = c(1, 0.25, -0.25), b2 = c(-0.5, 0.15, 0.15), b3 = c(-1, 0.1, -0.20)) ## subject specific probabilities for each category generate_multinom_probs(V = vt, X = xt, B = bt) ## subject specific probabilities for each category ## (over-parameterized model) generate_multinom_probs(V = vt, X = xt, B = bu)
Provide graphics for spatial extent of predictor parameters, rejections and/or the truth/falsity of the rejections.
inf_2D_image( rejections = NULL, B = NULL, im.res, test.statistic = NULL, reject.threshold = NULL, binarize.B = TRUE, grid.color = "grey", n.colors = length(unique(B)), B.incl.B0 = TRUE, plot.title = TRUE, manual.title = NULL, title.size = 1 )
inf_2D_image( rejections = NULL, B = NULL, im.res, test.statistic = NULL, reject.threshold = NULL, binarize.B = TRUE, grid.color = "grey", n.colors = length(unique(B)), B.incl.B0 = TRUE, plot.title = TRUE, manual.title = NULL, title.size = 1 )
rejections |
A binary vector; |
B |
A vector of "true" parameter values. For inference purposes, this can be a vector of actual parameter values, or a binary vector indicating non-zero status. |
im.res |
A vector defining the dimension of spatial data. The first entry is the number of rows and the second entry is the number of columns. |
test.statistic |
A vector of test statistics; e.g., t-statistics or p-values that are used to determine whether or not to reject the null hypothesis. |
reject.threshold |
A list whose first element is the rejection
criteria, e.g., the minimum t-statistic or maximum p-value for which to
reject the null hypothesis. The second element is one of
|
binarize.B |
Either |
grid.color |
Specify the color for the grid lines. |
n.colors |
Determines the number of colors in the printed image.
Default is |
B.incl.B0 |
If |
plot.title |
When |
manual.title |
When |
title.size |
Specifies the size of the title text. This is based on
|
An image depicting the spatial extent of some image characteristic.
If both rejections
and B
are specified then the
function provides an image with separate color each for:
No rejection and B[i] = 0
(i.e. True Negative).
No rejection and B[i] != 0
(i.e. False Negative).
Rejection and B[i] = 0
(i.e. False Positive).
Rejection and B[i] != 0
(i.e. True Positive).
## parameter vector Bex <- beta_builder(row.index = c(rep(5, 3), rep(6, 3), rep(7, 3)), col.index = rep(c(5, 6, 7), 3), im.res = c(10, 10), index.type = "manual", B0 = 0, B.values = 1:9, output.indices = FALSE) ## co-opt beta builder to get rejections rejex <- beta_builder(row.index = c(rep(4, 3), rep(5, 3), rep(6, 3)), col.index = rep(c(4, 5, 6), 3), im.res = c(10, 10), index.type = "manual", B0 = 0, B.values = rep(1, 9), output.indices = FALSE)[-1] rejex.sm2 <- beta_builder(row.index = 5:6, col.index = 5:6, im.res = c(10, 10), B0 = 0, B.values = 1, output.indices = FALSE)[-1] ## just B inf_2D_image(B = Bex, im.res = c(10, 10)) ## just rejections inf_2D_image(rejections = rejex, im.res = c(10, 10)) ## both B and rejections inf_2D_image(rejections = rejex, B = Bex, im.res = c(10, 10)) inf_2D_image(rejections = rejex.sm2, B = Bex, im.res = c(10, 10)) ## larger dimension example Bex2 <- beta_builder(row.index = 5:15, col.index = 16:20, im.res = c(50, 50), B0 = 0, B.values = 1:(length(5:15) * length(16:20)), index.type = "rectangle", output.indices = FALSE) rejex2 <- beta_builder(row.index = 13:21, col.index = 30:41, im.res = c(50, 50), B0 = 0, B.values = rep(1, (length(13:21) * length(30:41))), index.type = "rectangle", output.indices = FALSE)[-1] rejex3 <- beta_builder(row.index = 5:20, col.index = 16:30, im.res = c(50, 50), B0 = 0, B.values = rep(1, (length(5:20) * length(16:30))), index.type = "rectangle", output.indices = FALSE)[-1] rejex4 <- beta_builder(row.index = 5:10, col.index = 16:17, im.res = c(50, 50), B0 = 0, B.values = rep(1, (length(5:10) * length(16:17))), index.type = "rectangle", output.indices = FALSE)[-1] ## images inf_2D_image(B = Bex2, im.res = c(50, 50)) inf_2D_image(B = Bex2, im.res = c(50, 50), binarize.B = FALSE) inf_2D_image(rejections = rejex2, im.res = c(50, 50)) ## No TP inf_2D_image(rejections = rejex2, B = Bex2, im.res = c(50, 50)) ## ALL TP inf_2D_image(rejections = Bex2[-1], B = Bex2, im.res = c(50, 50)) ## No FN inf_2D_image(rejections = rejex3, B = Bex2, im.res = c(50, 50)) ## No FP, but FN inf_2D_image(rejections = rejex4, im.res = c(50, 50)) inf_2D_image(B = Bex2, im.res = c(50, 50)) inf_2D_image(rejections = rejex4, B = Bex2, im.res = c(50, 50))
## parameter vector Bex <- beta_builder(row.index = c(rep(5, 3), rep(6, 3), rep(7, 3)), col.index = rep(c(5, 6, 7), 3), im.res = c(10, 10), index.type = "manual", B0 = 0, B.values = 1:9, output.indices = FALSE) ## co-opt beta builder to get rejections rejex <- beta_builder(row.index = c(rep(4, 3), rep(5, 3), rep(6, 3)), col.index = rep(c(4, 5, 6), 3), im.res = c(10, 10), index.type = "manual", B0 = 0, B.values = rep(1, 9), output.indices = FALSE)[-1] rejex.sm2 <- beta_builder(row.index = 5:6, col.index = 5:6, im.res = c(10, 10), B0 = 0, B.values = 1, output.indices = FALSE)[-1] ## just B inf_2D_image(B = Bex, im.res = c(10, 10)) ## just rejections inf_2D_image(rejections = rejex, im.res = c(10, 10)) ## both B and rejections inf_2D_image(rejections = rejex, B = Bex, im.res = c(10, 10)) inf_2D_image(rejections = rejex.sm2, B = Bex, im.res = c(10, 10)) ## larger dimension example Bex2 <- beta_builder(row.index = 5:15, col.index = 16:20, im.res = c(50, 50), B0 = 0, B.values = 1:(length(5:15) * length(16:20)), index.type = "rectangle", output.indices = FALSE) rejex2 <- beta_builder(row.index = 13:21, col.index = 30:41, im.res = c(50, 50), B0 = 0, B.values = rep(1, (length(13:21) * length(30:41))), index.type = "rectangle", output.indices = FALSE)[-1] rejex3 <- beta_builder(row.index = 5:20, col.index = 16:30, im.res = c(50, 50), B0 = 0, B.values = rep(1, (length(5:20) * length(16:30))), index.type = "rectangle", output.indices = FALSE)[-1] rejex4 <- beta_builder(row.index = 5:10, col.index = 16:17, im.res = c(50, 50), B0 = 0, B.values = rep(1, (length(5:10) * length(16:17))), index.type = "rectangle", output.indices = FALSE)[-1] ## images inf_2D_image(B = Bex2, im.res = c(50, 50)) inf_2D_image(B = Bex2, im.res = c(50, 50), binarize.B = FALSE) inf_2D_image(rejections = rejex2, im.res = c(50, 50)) ## No TP inf_2D_image(rejections = rejex2, B = Bex2, im.res = c(50, 50)) ## ALL TP inf_2D_image(rejections = Bex2[-1], B = Bex2, im.res = c(50, 50)) ## No FN inf_2D_image(rejections = rejex3, B = Bex2, im.res = c(50, 50)) ## No FP, but FN inf_2D_image(rejections = rejex4, im.res = c(50, 50)) inf_2D_image(B = Bex2, im.res = c(50, 50)) inf_2D_image(rejections = rejex4, B = Bex2, im.res = c(50, 50))
Determine rejections
make_rejection(B, reject.threshold, test.statistic)
make_rejection(B, reject.threshold, test.statistic)
B |
A vector of "true" parameter values. For inference purposes, this can be a vector of actual parameter values, or a binary vector indicating non-zero status. |
reject.threshold |
A list whose first element is the rejection
criteria, e.g., the minimum t-statistic or maximum p-value for which to
reject the null hypothesis. The second element is one of
|
test.statistic |
A vector of test statistics; e.g., t-statistics or p-values that are used to determine whether or not to reject the null hypothesis. |
A vector of hypothesis testing rejection indicators, where 1 indicates a rejection and 0 otherwise.
Determine and store neighbors by Euclidean Distance Constraints
neighbors_by_dist(x, y, coords, im.res, r, print.ring = FALSE)
neighbors_by_dist(x, y, coords, im.res, r, print.ring = FALSE)
x , y
|
are the row and column coordinates, respectively. |
coords |
A dataframe containing indices and coordinates for the image. |
im.res |
A vector containing the number of rows and columns, respectively. |
r |
A scalar value determining the radius within which other locations are neighbors to the current location (x, y). |
print.ring |
When |
A tibble whose first column contains x indices, second column contains y indices, and third column denotes the current ring about a location.
This function avoids testing all points for being with a certain distance in order to determine neighbor status of a given point by progressively widening a box around the point. Each iteration widens the box by an extra ring, and we only test points in the new ring. If at the end of testing a ring there are no new neighbors then we stop expanding the box and return the neighbors' coordinates. For computational efficiency, this function assumes that all arguments except the current point's coordinates have been specified.
## Necessary pre-specified arguments required for the function to work. ## image resoluation + number of spatial predictors im.res <- c(5, 5) J <- prod(im.res) ## create predictor indices w/ coordinates row.id <-rep(1, im.res[2]) for (i in 2:im.res[1]) { row.id <- c(row.id, rep(i, im.res[2])) } coords <- data.frame(index = 1:J, row.id = row.id, col.id = rep(c(1:im.res[2]), im.res[1]) ) neighbors_by_dist(x = 2, y = 2, im.res = im.res, coords = coords, r = 2)
## Necessary pre-specified arguments required for the function to work. ## image resoluation + number of spatial predictors im.res <- c(5, 5) J <- prod(im.res) ## create predictor indices w/ coordinates row.id <-rep(1, im.res[2]) for (i in 2:im.res[1]) { row.id <- c(row.id, rep(i, im.res[2])) } coords <- data.frame(index = 1:J, row.id = row.id, col.id = rep(c(1:im.res[2]), im.res[1]) ) neighbors_by_dist(x = 2, y = 2, im.res = im.res, coords = coords, r = 2)
This function constructs the precision matrix for a Conditional Autoregression (CAR).
precision_builder( im.res, tau = 1, alpha = 0.75, neighborhood = "ar1", weight = "binary", phi = 1, r = NULL, w = NULL, h = NULL, digits.Q = 10 )
precision_builder( im.res, tau = 1, alpha = 0.75, neighborhood = "ar1", weight = "binary", phi = 1, r = NULL, w = NULL, h = NULL, digits.Q = 10 )
im.res |
A vector defining the dimension of spatial data. The first entry is the number of rows and the second entry is the number of columns. |
tau |
A vector containing precision parameters. If of length 1, then
all precisions are assumed equal. Otherwise the length of |
alpha |
A scalar value between 0 and 1 that defines the strength of
correlations. Note that when |
neighborhood |
Defines the neighborhood within which conditional
correlations are non-zero. This differs from use in
|
weight |
Determines how weights are assigned. |
phi |
When |
r |
If |
w , h
|
If |
digits.Q |
Determines the number of digits to round entries in the precision matrix. Default is 10. |
This formulation of the CAR model is based on a formulation found
in (Banerjee et al. 2015) where the joint distribution
of the of the conditionally specified random variables are assumed to be
and all neighbors are
weighted 1. When weights other than 1 are desired, the joint distribution
is
, e.g. as in
(Jin et al. 2005).
A (precision) matrix.
Banerjee S, Carlin BP, Gelfand AE (2015). Hierarchical Modeling and Analysis for Spatial Data, Second edition. Chapman & Hall/CRC, Boca Raton, Florida.
Jin X, Carlin BP, Banerjee S (2005). “Generalized Hierarchical Multivariate CAR Models for Areal Data.” Biometrics, 61(4), 950-961. doi:10.1111/j.1541-0420.2005.00359.x.
## Not run: precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, neighborhood = "ar1") ## binary weights precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, neighborhood = "round", r = 3) ## weights based on distance precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, weight = "distance", phi = 1, neighborhood = "round", r = 3) precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, neighborhood = "rectangle", w = 2, h = 2) ## End(Not run)
## Not run: precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, neighborhood = "ar1") ## binary weights precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, neighborhood = "round", r = 3) ## weights based on distance precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, weight = "distance", phi = 1, neighborhood = "round", r = 3) precision_builder(im.res = c(3, 3), tau = 1, alpha = 0.75, neighborhood = "rectangle", w = 2, h = 2) ## End(Not run)
Generates a proximity matrix where non-zero entries are the weights associated with neighbors, and zero entries are not neighbors.
proximity_builder( im.res, neighborhood = "ar1", type = c("sparse", "full"), weight = "binary", phi = 1, r = NULL, h = NULL, w = NULL, include.coords = FALSE, print.im = FALSE )
proximity_builder( im.res, neighborhood = "ar1", type = c("sparse", "full"), weight = "binary", phi = 1, r = NULL, h = NULL, w = NULL, include.coords = FALSE, print.im = FALSE )
im.res |
A vector defining the dimension of spatial data. The first entry is the number of rows and the second entry is the number of columns. |
neighborhood |
Determines how to assign neighbor status to locations;
i.e. 1 for neighbors, 0 otherwise. |
type |
Specifies either sparse ( |
weight |
Determines how weights are assigned. |
phi |
When |
r , h , w
|
When |
include.coords |
If |
print.im |
Allows user to print the 2D "image" matrix with index labels to visually verify that the proximity matrix is as expected. |
A (proximity) matrix.
## Not run: ## adjacency matrix with sparse structure (i.e., 2 columns) ## and ar1 neighborhood sp.ar1 <- proximity_builder(im.res = c(3, 3), weight = "binary", neighborhood = "ar1", type = "sparse") ## adjacency matrix with full structure ## (i.e., prod(im.dim) rows & columns) and ar1 neighborhood full.ar1 <- proximity_builder(im.res = c(3, 3), weight = "binary", neighborhood = "ar1", type = "full") ## proximity matrix weighted by distance (sparse) sp.rnd <- proximity_builder(im.res = c(3, 3), weight = "distance", neighborhood = "round", r = 2, type = "sparse", include.coords = TRUE) ## proximity matrix weighted by distance (full) full.rnd <- proximity_builder(im.res = c(3, 3), weight = "distance", neighborhood = "round", r = 2, type = "full") ## End(Not run)
## Not run: ## adjacency matrix with sparse structure (i.e., 2 columns) ## and ar1 neighborhood sp.ar1 <- proximity_builder(im.res = c(3, 3), weight = "binary", neighborhood = "ar1", type = "sparse") ## adjacency matrix with full structure ## (i.e., prod(im.dim) rows & columns) and ar1 neighborhood full.ar1 <- proximity_builder(im.res = c(3, 3), weight = "binary", neighborhood = "ar1", type = "full") ## proximity matrix weighted by distance (sparse) sp.rnd <- proximity_builder(im.res = c(3, 3), weight = "distance", neighborhood = "round", r = 2, type = "sparse", include.coords = TRUE) ## proximity matrix weighted by distance (full) full.rnd <- proximity_builder(im.res = c(3, 3), weight = "distance", neighborhood = "round", r = 2, type = "full") ## End(Not run)
This function calculates sample FDR, FWER, and Power for large numbers of predictors, given a vector of "true" parameter values and a vector of associated rejections. In the case that more than 1 predictor has a "true" non-zero parameter, then Power is defined as the proportion/percentage of those "true" parameters identified.
sample_FP_Power( rejections = NULL, FP = NULL, TP = NULL, test.statistic = NULL, reject.threshold = NULL, B = NULL, B.incl.B0 = TRUE, full.summary = FALSE )
sample_FP_Power( rejections = NULL, FP = NULL, TP = NULL, test.statistic = NULL, reject.threshold = NULL, B = NULL, B.incl.B0 = TRUE, full.summary = FALSE )
rejections |
A binary vector; |
FP , TP
|
Binary vectors of false positive and true positive indicators,
respectively. |
test.statistic |
A vector of test statistics; e.g., t-statistics or p-values that are used to determine whether or not to reject the null hypothesis. |
reject.threshold |
A list whose first element is the rejection
criteria, e.g., the minimum t-statistic or maximum p-value for which to
reject the null hypothesis. The second element is one of
|
B |
A vector of "true" parameter values. For inference purposes, this can be a vector of actual parameter values, or a binary vector indicating non-zero status. |
B.incl.B0 |
If |
full.summary |
If |
A data frame with columns for sample FDR, FWER, and Power.
The default operating approach is that the null hypothesis is
B[i] = 0
for each parameter. If other hypotheses are being tested
then B
should be converted to a binary vector indicating whether
the null hypothesis should have been rejected.
## example 1 ## rejection vector rej.ex <- c(0, 1, 1, 0, 0, 1, 0) ## false positive vector fp.ex <- c(0, 0, 1, 0, 0, 0, 0) ## true positive vector tp.ex <- c(0, 1, 0, 0, 0, 1, 0) ## parameter vector par.ex <- c(0, 4, 0, 0, 3, 9, 0) sample_FP_Power(rej.ex, fp.ex, tp.ex, par.ex, B.incl.B0 = FALSE) ## Function can calculate TP and FP vectors sample_FP_Power(rejections = rej.ex, FP = NULL, TP = NULL, B = par.ex, B.incl.B0 = FALSE) ## example 2: sum(FP, TP) must equal sum(rejections) or ## function stops execution rej.ex2 <- c(0, 1, 0, 0, 0, 1, 0) fp.ex2 <- c(0, 0, 1, 0, 0, 0, 0) tp.ex2 <- c(0, 1, 0, 0, 0, 1, 0) par.ex2 <- c(0, 4, 0, 0, 3, 9, 0) ## Not run: sample_FP_Power(rej.ex2, fp.ex2, tp.ex2, par.ex2, B.incl.B0 = FALSE) ## End(Not run) ## example 3: calculate rejections from vector of test statistics zstat <- c(-0.5, 1.98, 2.01, 1.45, -1.99) # 2-tailed sample_FP_Power(test.statistic = zstat, reject.threshold = list(1.96, "2-tailed"), B = c(0, 0, 4, 1, -2), B.incl.B0 = FALSE) # 1-tailed (upper) sample_FP_Power(test.statistic = zstat, reject.threshold = list(1.96, "greater"), B = c(0, 0, 4, 1, -2), B.incl.B0 = FALSE) ## p-value sample_FP_Power(test.statistic = c(0.44, 0.04, 0.01, 0.06, 0.02 ), reject.threshold = list(0.05, "less"), B = c(0, 0, 4, 1, -2), B.incl.B0 = FALSE)
## example 1 ## rejection vector rej.ex <- c(0, 1, 1, 0, 0, 1, 0) ## false positive vector fp.ex <- c(0, 0, 1, 0, 0, 0, 0) ## true positive vector tp.ex <- c(0, 1, 0, 0, 0, 1, 0) ## parameter vector par.ex <- c(0, 4, 0, 0, 3, 9, 0) sample_FP_Power(rej.ex, fp.ex, tp.ex, par.ex, B.incl.B0 = FALSE) ## Function can calculate TP and FP vectors sample_FP_Power(rejections = rej.ex, FP = NULL, TP = NULL, B = par.ex, B.incl.B0 = FALSE) ## example 2: sum(FP, TP) must equal sum(rejections) or ## function stops execution rej.ex2 <- c(0, 1, 0, 0, 0, 1, 0) fp.ex2 <- c(0, 0, 1, 0, 0, 0, 0) tp.ex2 <- c(0, 1, 0, 0, 0, 1, 0) par.ex2 <- c(0, 4, 0, 0, 3, 9, 0) ## Not run: sample_FP_Power(rej.ex2, fp.ex2, tp.ex2, par.ex2, B.incl.B0 = FALSE) ## End(Not run) ## example 3: calculate rejections from vector of test statistics zstat <- c(-0.5, 1.98, 2.01, 1.45, -1.99) # 2-tailed sample_FP_Power(test.statistic = zstat, reject.threshold = list(1.96, "2-tailed"), B = c(0, 0, 4, 1, -2), B.incl.B0 = FALSE) # 1-tailed (upper) sample_FP_Power(test.statistic = zstat, reject.threshold = list(1.96, "greater"), B = c(0, 0, 4, 1, -2), B.incl.B0 = FALSE) ## p-value sample_FP_Power(test.statistic = c(0.44, 0.04, 0.01, 0.06, 0.02 ), reject.threshold = list(0.05, "less"), B = c(0, 0, 4, 1, -2), B.incl.B0 = FALSE)
Takes N draws from a Multivariate Normal (MVN) distribution using either
base R or the R package spam
. This function requires the Cholesky
decomposition of the desired covariance matrix.
sim_MVN_X( N, mu = 0, L = NULL, R = NULL, S = NULL, Q = NULL, use.spam = FALSE, use.MASS = FALSE, X.categorical = FALSE, X.num.categories = 2, X.category.type = "percentile", X.percentiles = NULL, X.manual.thresh = NULL, X.cat.names = NULL )
sim_MVN_X( N, mu = 0, L = NULL, R = NULL, S = NULL, Q = NULL, use.spam = FALSE, use.MASS = FALSE, X.categorical = FALSE, X.num.categories = 2, X.category.type = "percentile", X.percentiles = NULL, X.manual.thresh = NULL, X.cat.names = NULL )
N |
The number of draws to take from MVN; i.e., the number of subjects. |
mu |
One of the following:
|
L , R
|
|
S , Q
|
A covariance or precision matrix respectively. These are for
use with |
use.spam |
Logical. If |
use.MASS |
Logical. When |
X.categorical |
Default is |
X.num.categories |
A scalar value denoting the number of categories in which to divide the data. |
X.category.type |
Tells R how to categorize the data. Options are
|
X.percentiles |
A vector of percentiles to be used in thresholding
when |
X.manual.thresh |
A vector containing the thresholds for categorizing
the values; e.g. if |
X.cat.names |
A vector of category names. If |
Matrix of dimension N
x (nrow(L))
(or equivalently
N
x (nrow(R))
) where each row is draw from MVN, and each
column represents a different "variable"; e.g. location in an image.
This function requires the Cholesky decomposition of the desired
covariance matrix for the MVN; this allows for using this function in
simulating multiple datasets of N
MVN draws while only taking the
Cholesky decomposition of the covariance matrix once.
Furrer R, Sain SR (2010). “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields.” Journal of Statistical Software, 36(10), 1-25. https://www.jstatsoft.org/v36/i10/.
Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.
Rue H (2001). “Fast Sampling of Gaussian Markov Random Fields.” Journal of the Royal Statistical Society B, 63, 325-338. doi:10.1111/1467-9868.00288.
## verify MVN with base R set.seed(732) Lex <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.25, sigma = 1, use.spam = FALSE, corr.min = 0.02, triangle = "lower", return.cov = TRUE) XbR = sim_MVN_X(N = 1000, mu = 0, L = Lex$L) apply(XbR, 2, mean) cov(XbR) Lex$S ## verify MVN with \code{spam} set.seed(472) Rex <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec", use.spam = TRUE, neighborhood = "ar1", triangle = "upper", return.prec = TRUE) Xspam = sim_MVN_X(N = 1000, mu = 0, R = Rex$R, Q = Rex$Q) apply(Xspam, 2, mean) solve(cov(Xspam)) as.matrix(Rex$Q) ## Categories set.seed(832) Xtest <- sim_MVN_X(N = 30, mu = 0, L = Lex$L, X.categorical = TRUE, X.num.categories = 3, X.category.type = "percentile", X.cat.names = c("A", "B", "C")) Xtest
## verify MVN with base R set.seed(732) Lex <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.25, sigma = 1, use.spam = FALSE, corr.min = 0.02, triangle = "lower", return.cov = TRUE) XbR = sim_MVN_X(N = 1000, mu = 0, L = Lex$L) apply(XbR, 2, mean) cov(XbR) Lex$S ## verify MVN with \code{spam} set.seed(472) Rex <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec", use.spam = TRUE, neighborhood = "ar1", triangle = "upper", return.prec = TRUE) Xspam = sim_MVN_X(N = 1000, mu = 0, R = Rex$R, Q = Rex$Q) apply(Xspam, 2, mean) solve(cov(Xspam)) as.matrix(Rex$Q) ## Categories set.seed(832) Xtest <- sim_MVN_X(N = 30, mu = 0, L = Lex$L, X.categorical = TRUE, X.num.categories = 3, X.category.type = "percentile", X.cat.names = c("A", "B", "C")) Xtest
N spatially dependent binary design vectors are simulated using
sim2D_binarymap
. These design vectors are used to then simulate
scalar outcomes that have one of Gaussian, Binomial, or Poisson
distributions.
sim_Y_Binary_X( N, B, rand.err = 1, dist, incl.subjectID = TRUE, binomial.method = "traditional", count.method = "traditional", Y.thresh = NULL, print.out = FALSE, xlim = c(0, 1), ylim = c(0, 1), im.res, radius.bounds = c(0.02, 0.1), lambda = 50, random.lambda = FALSE, lambda.sd = 10, lambda.bound = NULL, prior = "gamma", sub.area = FALSE, min.sa = c(0.1, 0.1), max.sa = c(0.3, 0.3), radius.bounds.min.sa = c(0.02, 0.05), radius.bounds.max.sa = c(0.08, 0.15), print.subj.sa = FALSE, print.lambda = FALSE, print.iter = FALSE )
sim_Y_Binary_X( N, B, rand.err = 1, dist, incl.subjectID = TRUE, binomial.method = "traditional", count.method = "traditional", Y.thresh = NULL, print.out = FALSE, xlim = c(0, 1), ylim = c(0, 1), im.res, radius.bounds = c(0.02, 0.1), lambda = 50, random.lambda = FALSE, lambda.sd = 10, lambda.bound = NULL, prior = "gamma", sub.area = FALSE, min.sa = c(0.1, 0.1), max.sa = c(0.3, 0.3), radius.bounds.min.sa = c(0.02, 0.05), radius.bounds.max.sa = c(0.08, 0.15), print.subj.sa = FALSE, print.lambda = FALSE, print.iter = FALSE )
N |
A scalar value determining the number of images to create. |
B |
A vector parameter values; i.e. "betas". Note that
|
rand.err |
A scalar for the random error variance when
|
dist |
The distribution of the scalar outcome.
|
incl.subjectID |
When |
binomial.method |
One of |
count.method |
One of |
Y.thresh |
When |
print.out |
If
This is useful to see the effect of image parameter selection and beta parameter selection on distributional parameters for the outcome of interest. |
xlim , ylim
|
These are the 2D image limits. Defaults for both are
|
im.res |
A vector specifying the dimension/resolution of the image. The first entry is the number of 'rows' in the lattice/image, and the second entry is the number of columns' in the lattice/image. |
radius.bounds |
A 2-element vector whose first and second entries
determine the minimum and maximum radius sizes, respectively; these will
be the bounds of the uniform distribution used to draw the radii. If
|
lambda |
A scalar value specifying the mean/intensity value of the
Poisson process. If |
random.lambda |
|
lambda.sd |
Only utilized when |
lambda.bound |
Only utilized when |
prior |
Only utilized when |
sub.area |
When |
min.sa , max.sa
|
Only utilized when |
radius.bounds.min.sa , radius.bounds.max.sa
|
Only utilized when
|
print.subj.sa , print.lambda , print.iter
|
These arguments are either
|
A data frame where each row consists of a single subject's data. Col 1 is the outcome, Y, and each successive column contains the subject predictor values.
Careful parameter selection, i.e. B
, is necessary to ensure
that simulated outcomes are reasonable; in particular, counts arising from
the Poisson distribution can be unnaturally large.
Cressie N, Wikle CK (2011). Statistics for Spatio-Temporal Data, Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ.
Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.
## Define non-zero beta values Bex <- beta_builder(row.index = c(3, 3, 4), col.index = c(3, 4, 3), im.res = c(5, 5), B0 = 0, B.values = rep(1/3, 3), output.indices = FALSE) ## Simulate Datasets ## parameter values Nex = 10 set.seed(28743) Gauss.ex <- sim_Y_Binary_X(N = Nex, B = Bex, dist = "gaussian", im.res = c(5, 5)) hist(Gauss.ex$Y) ## direct draws from binomial Bin.ex <- sim_Y_Binary_X(N = Nex, B = Bex, im.res = c(5, 5), dist = "binomial", print.out = TRUE) table(Bin.ex$Y)
## Define non-zero beta values Bex <- beta_builder(row.index = c(3, 3, 4), col.index = c(3, 4, 3), im.res = c(5, 5), B0 = 0, B.values = rep(1/3, 3), output.indices = FALSE) ## Simulate Datasets ## parameter values Nex = 10 set.seed(28743) Gauss.ex <- sim_Y_Binary_X(N = Nex, B = Bex, dist = "gaussian", im.res = c(5, 5)) hist(Gauss.ex$Y) ## direct draws from binomial Bin.ex <- sim_Y_Binary_X(N = Nex, B = Bex, im.res = c(5, 5), dist = "binomial", print.out = TRUE) table(Bin.ex$Y)
N spatially correlated design vectors are simulated from an MVN. These design vectors are used to then simulate scalar outcomes that have one of Gaussian, Binomial, Multinomial or Poisson distributions.
sim_Y_MVN_X( N, B, L = NULL, R = NULL, S = NULL, Q = NULL, use.spam = TRUE, mu = 0, rand.err = 1, dist = "gaussian", V = NULL, incl.subjectID = TRUE, threshold.method = "none", Y.thresh = NULL, X.categorical = FALSE, X.num.categories = 2, X.category.type = "percentile", X.manual.thresh = NULL, X.cat.names = NULL, print.out = FALSE )
sim_Y_MVN_X( N, B, L = NULL, R = NULL, S = NULL, Q = NULL, use.spam = TRUE, mu = 0, rand.err = 1, dist = "gaussian", V = NULL, incl.subjectID = TRUE, threshold.method = "none", Y.thresh = NULL, X.categorical = FALSE, X.num.categories = 2, X.category.type = "percentile", X.manual.thresh = NULL, X.cat.names = NULL, print.out = FALSE )
N |
The number of draws to take from MVN; i.e., the number of subjects. |
B |
A vector parameter values; i.e. "betas". Note that
|
L , R
|
|
S , Q
|
A covariance or precision matrix respectively. These are for
use with |
use.spam |
Logical. If |
mu |
One of the following:
|
rand.err |
A vector for the random error standard deviation when
|
dist |
The distribution of the scalar outcome.
|
V |
A numeric value stating the number of categories desired when
|
incl.subjectID |
When |
threshold.method |
One of |
Y.thresh |
A manual value used to threshold when
|
X.categorical |
Default is |
X.num.categories |
A scalar value denoting the number of categories in which to divide the data. |
X.category.type |
Tells R how to categorize the data. Options are
|
X.manual.thresh |
A vector containing the thresholds for categorizing
the values; e.g. if |
X.cat.names |
A vector of category names. If |
print.out |
If
This is useful to see the effect of image parameter selection and beta parameter selection on distributional parameters for the outcome of interest. |
A data frame where each row consists of a single subject's data. Col 1 is the outcome, Y, and each successive column contains the subject predictor values.
Careful parameter selection, i.e. B
, is necessary to ensure
that simulated outcomes are reasonable; in particular, counts arising from
the Poisson distribution can be unnaturally large.
Furrer R, Sain SR (2010). “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields.” Journal of Statistical Software, 36(10), 1-25. https://www.jstatsoft.org/v36/i10/.
Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.
Rue H (2001). “Fast Sampling of Gaussian Markov Random Fields.” Journal of the Royal Statistical Society B, 63, 325-338. doi:10.1111/1467-9868.00288.
Agresti A (2007). An Introduction to Categorical Analysis, 2nd edition. John Wiley & Sons, Hoboken, New Jersey.
Friedman J, Hastie T, Tibshirani R (2010). “Regularization paths for generalized linear models via coordinate descent.” Journal of Statistical Software, 33, 1-22. doi:10.18637/jss.v033.i01.
## generate precision matrix and take Cholesky decomposition Rpre <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec", use.spam = TRUE, neighborhood = "ar1", triangle = "upper", return.prec = TRUE) ## Generate correlation matrix & take Cholesky decomposition Rcov <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, triangle = "upper", use.spam = FALSE, neighborhood = "none") ## Define non-zero beta values Bex <- beta_builder(row.index = c(2, 3), col.index = c(3, 3), im.res = c(3, 3), B0 = 0, B.values = rep(1, 2), output.indices = FALSE) ## Simulate Datasets ## parameter values Nex = 100 set.seed(28743) ## with precision matrix Gauss.exp <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rpre$R, Q = Rpre$Q, dist = "gaussian") hist(Gauss.exp$Y) ## with covariance matrix Gauss.exc <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "gaussian") hist(Gauss.exc$Y) ## direct draws from binomial Bin.ex <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "binomial", print.out = TRUE) table(Bin.ex$Y) ## manual cutoff Bin.ex2 <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "binomial", threshold.method = "manual", Y.thresh = 1.25) table(Bin.ex2$Y) ## percentile cutoff Bin.ex3 <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "binomial", threshold.method = "percentile", Y.thresh = 0.75) table(Bin.ex3$Y) ## Poisson Example - note the large counts Pois.ex <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "poisson", print.out = TRUE) mean(Pois.ex$Y) quantile(Pois.ex$Y, probs = c(0, 0.1, 0.25, 0.45, 0.5, 0.75, 0.9, 0.95, 0.99, 1)) hist(Pois.ex$Y)
## generate precision matrix and take Cholesky decomposition Rpre <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec", use.spam = TRUE, neighborhood = "ar1", triangle = "upper", return.prec = TRUE) ## Generate correlation matrix & take Cholesky decomposition Rcov <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, triangle = "upper", use.spam = FALSE, neighborhood = "none") ## Define non-zero beta values Bex <- beta_builder(row.index = c(2, 3), col.index = c(3, 3), im.res = c(3, 3), B0 = 0, B.values = rep(1, 2), output.indices = FALSE) ## Simulate Datasets ## parameter values Nex = 100 set.seed(28743) ## with precision matrix Gauss.exp <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rpre$R, Q = Rpre$Q, dist = "gaussian") hist(Gauss.exp$Y) ## with covariance matrix Gauss.exc <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "gaussian") hist(Gauss.exc$Y) ## direct draws from binomial Bin.ex <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "binomial", print.out = TRUE) table(Bin.ex$Y) ## manual cutoff Bin.ex2 <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "binomial", threshold.method = "manual", Y.thresh = 1.25) table(Bin.ex2$Y) ## percentile cutoff Bin.ex3 <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "binomial", threshold.method = "percentile", Y.thresh = 0.75) table(Bin.ex3$Y) ## Poisson Example - note the large counts Pois.ex <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S, dist = "poisson", print.out = TRUE) mean(Pois.ex$Y) quantile(Pois.ex$Y, probs = c(0, 0.1, 0.25, 0.45, 0.5, 0.75, 0.9, 0.95, 0.99, 1)) hist(Pois.ex$Y)
Use a Homogenous Poisson Process to generate random "events", a uniform distribution to generate circles of random radii about the events, and take the union to obtain a random set. This is mapped onto a lattice to obtain a binary map.
sim2D_binarymap( N, xlim = c(0, 1), ylim = c(0, 1), im.res, radius.bounds = c(0.02, 0.1), lambda = 50, random.lambda = FALSE, lambda.sd = 10, lambda.bound = NULL, prior = "gamma", sub.area = FALSE, min.sa = c(0.1, 0.1), max.sa = c(0.3, 0.3), radius.bounds.min.sa = c(0.02, 0.05), radius.bounds.max.sa = c(0.08, 0.15), print.subj.sa = FALSE, print.lambda = FALSE, print.iter = FALSE, store.type = "list", output.randset = FALSE )
sim2D_binarymap( N, xlim = c(0, 1), ylim = c(0, 1), im.res, radius.bounds = c(0.02, 0.1), lambda = 50, random.lambda = FALSE, lambda.sd = 10, lambda.bound = NULL, prior = "gamma", sub.area = FALSE, min.sa = c(0.1, 0.1), max.sa = c(0.3, 0.3), radius.bounds.min.sa = c(0.02, 0.05), radius.bounds.max.sa = c(0.08, 0.15), print.subj.sa = FALSE, print.lambda = FALSE, print.iter = FALSE, store.type = "list", output.randset = FALSE )
N |
A scalar value determining the number of images to create. |
xlim , ylim
|
These are the 2D image limits. Defaults for both are
|
im.res |
A vector specifying the dimension/resolution of the image. The first entry is the number of 'rows' in the lattice/image, and the second entry is the number of columns' in the lattice/image. |
radius.bounds |
A 2-element vector whose first and second entries
determine the minimum and maximum radius sizes, respectively; these will
be the bounds of the uniform distribution used to draw the radii. If
|
lambda |
A scalar value specifying the mean/intensity value of the
Poisson process. If |
random.lambda |
|
lambda.sd |
Only utilized when |
lambda.bound |
Only utilized when |
prior |
Only utilized when |
sub.area |
When |
min.sa , max.sa
|
Only utilized when |
radius.bounds.min.sa , radius.bounds.max.sa
|
Only utilized when
|
print.subj.sa , print.lambda , print.iter
|
These arguments are either
|
store.type |
One of |
output.randset |
Logical. When |
A list; each element is a matrix of zeroes and ones.
Cressie N, Wikle CK (2011). Statistics for Spatio-Temporal Data, Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ.
bin_ims <- sim2D_binarymap(N = 5, im.res = c(10, 10), store.type = "list", lambda = 50, sub.area = TRUE, min.sa = c(0.10, 0.10), max.sa = c(0.5, 0.5), radius.bounds.min.sa = c(0.015, 0.04), radius.bounds.max.sa = c(0.041, 0.06)) rotate = function(x){ t(apply(x, 2, rev)) } for (i in 1:length(bin_ims)) { image(rotate(bin_ims[[i]]), col = c("white", "darkgreen"), axes = FALSE) box() grid(nx = 10, ny = 10, col = "black", lty = 1) }
bin_ims <- sim2D_binarymap(N = 5, im.res = c(10, 10), store.type = "list", lambda = 50, sub.area = TRUE, min.sa = c(0.10, 0.10), max.sa = c(0.5, 0.5), radius.bounds.min.sa = c(0.015, 0.04), radius.bounds.max.sa = c(0.041, 0.06)) rotate = function(x){ t(apply(x, 2, rev)) } for (i in 1:length(bin_ims)) { image(rotate(bin_ims[[i]]), col = c("white", "darkgreen"), axes = FALSE) box() grid(nx = 10, ny = 10, col = "black", lty = 1) }
A random set is generated by using a Poisson process in 2D space to choose 'event' locations, about which a circle of random radius is 'drawn'. The union of the circles defines ultimately defines the set.
sim2D_RandSet_HPPP( N, xlim = c(0, 1), ylim = c(0, 1), radius.bounds = c(0.02, 0.1), lambda = 50, lambda.sd = 10, lambda.bound = NULL, prior = "gamma", random.lambda = FALSE, sub.area = FALSE, min.sa = c(0.1, 0.1), max.sa = c(0.3, 0.3), radius.bounds.min.sa = c(0.02, 0.05), radius.bounds.max.sa = c(0.08, 0.15), print.subj.sa = FALSE, print.lambda = FALSE, print.iter = FALSE )
sim2D_RandSet_HPPP( N, xlim = c(0, 1), ylim = c(0, 1), radius.bounds = c(0.02, 0.1), lambda = 50, lambda.sd = 10, lambda.bound = NULL, prior = "gamma", random.lambda = FALSE, sub.area = FALSE, min.sa = c(0.1, 0.1), max.sa = c(0.3, 0.3), radius.bounds.min.sa = c(0.02, 0.05), radius.bounds.max.sa = c(0.08, 0.15), print.subj.sa = FALSE, print.lambda = FALSE, print.iter = FALSE )
N |
A scalar value determining the number of images to create. |
xlim , ylim
|
These are the 2D image limits. Defaults for both are
|
radius.bounds |
A 2-element vector whose first and second entries
determine the minimum and maximum radius sizes, respectively; these will
be the bounds of the uniform distribution used to draw the radii. If
|
lambda |
A scalar value specifying the mean/intensity value of the
Poisson process. If |
lambda.sd |
Only utilized when |
lambda.bound |
Only utilized when |
prior |
Only utilized when |
random.lambda |
|
sub.area |
When |
min.sa , max.sa
|
Only utilized when |
radius.bounds.min.sa , radius.bounds.max.sa
|
Only utilized when
|
print.subj.sa , print.lambda , print.iter
|
These arguments are either
|
A dataframe with columns for subject ID, x-coordinates, y-coordinates, and associated radii.
Cressie N, Wikle CK (2011). Statistics for Spatio-Temporal Data, Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ.
Determine whether locations in the image/lattice (from generate.grid
)
are within or without the union of a random set generated by
sim2D_HPPP_coords()
. If the Euclidean distance between a lattice
location and any 'event' is less than the radius about the 'event', then
the location is said to be within the random set. Otherwise, it is without
the random set.
within_area(grid.centers, radii, event.xcoord, event.ycoord)
within_area(grid.centers, radii, event.xcoord, event.ycoord)
grid.centers |
Output from |
radii |
A vector of radii values. |
event.xcoord , event.ycoord
|
Paired vectors specifying the x- and y- coordinates, respectively, of each 'event' from the Poisson process. |
A data frame with lattice x- and y- coordinates, and a binary vector where 1 indicates the location is within the random set, and 0 indicates the location is without the random set.