Package 'Bayenet' reference manual

Title:	Bayesian Quantile Elastic Net for Genetic Study
Description:	As heavy-tailed error distribution and outliers in the response variable widely exist, models which are robust to data contamination are highly demanded. Here, we develop a novel robust Bayesian variable selection method with elastic net penalty for quantile regression in genetic analysis. In particular, the spike-and-slab priors have been incorporated to impose sparsity. An efficient Gibbs sampler has been developed to facilitate computation.The core modules of the package have been developed in 'C++' and R.
Authors:	Xi Lu [aut, cre], Cen Wu [aut]
Maintainer:	Xi Lu <[email protected]>
License:	GPL-2
Version:	0.2
Built:	2025-03-09 03:56:27 UTC
Source:	https://github.com/xilustat/bayenet

Bayesian Quantile Elastic Net for Genetic Study

Description

In this package, we provide a set of robust Bayesian quantile variable selection methods for genetic analysis. A Bayesian formulation of the quantile regression has been adopted to accommodate data contamination and heavy-tailed distributions in the response. The proposed method conducts a robust quantile variable selection by accounting for structural sparsity. In particular, the spike-and-slab priors are imposed to identify important genetic effects. In addition to the default method, users can also choose different structures (robust or non-robust) and penalty (lasso or elastic net) with or without spike-and-slab priors.

Details

_package

The user friendly, integrated interface Bayenet() allows users to flexibly choose the fitting methods they prefer. There are three arguments in Bayenet() that control the fitting method: robust: whether to use robust methods; sparse: whether to use the spike-and-slab priors to create sparsity; penalty: use lasso or elastic net as penalty. The function Bayenet() returns a Bayenet object that contains the posterior estimates of each coefficients. predict.Bayenet() and print.Bayenet() are implemented for Bayenet objects. predict.Bayenet() takes a Bayenet object and returns the predicted values for new observations.

References

Lu, X. and Wu, C. (2023). Bayesian quantile elastic net with spike-and-slab priors.

Lu, X., Fan, K., Ren, J., and Wu, C. (2021). Identifying Gene–Environment Interactions With Robust Marginal Bayesian Variable Selection. Frontiers in Genetics, 12:667074 doi:10.3389/fgene.2021.667074

Zhou, F., Ren, J., Lu, X., Ma, S. and Wu, C. (2020). Gene–Environment Interaction: a Variable Selection Perspective. Epistasis. Methods in Molecular Biology. Humana Press (Accepted) https://arxiv.org/abs/2003.02930

Wu, C., Cui, Y., and Ma, S. (2014). Integrative analysis of gene–environment interactions under a multi–response partially linear varying coefficient model. Statistics in Medicine, 33(28), 4988–4998 doi:10.1002/sim.6287

Li, Q. and Lin, N. (2010). The Bayesian elastic net. Bayesian Anal, 5(1): 151-170 doi:10.1214/10-BA506

Li, Q., Xi, R. and Lin, N. (2010). The Bayesian regularized quantile regression. Bayesian Analysis, 5(3): 533-556 doi:10.1214/10-BA521

fit a robust Bayesian elastic net variable selection model for genetic study.

Description

fit a robust Bayesian elastic net variable selection model for genetic study.

Usage

Bayenet(
  X,
  Y,
  clin,
  max.steps = 10000,
  robust = TRUE,
  sparse = TRUE,
  penalty = c("lasso", "elastic net"),
  debugging = FALSE
)
Bayenet(
  X,
  Y,
  clin,
  max.steps = 10000,
  robust = TRUE,
  sparse = TRUE,
  penalty = c("lasso", "elastic net"),
  debugging = FALSE
)

Arguments

`X`	the matrix of predictors (genetic factors). Each row should be an observation vector.
`Y`	the continuous response variable.
`clin`	a matrix of clinical variables. Clinical variables are not subject to penalize. Clinical variables will be centered and a column of 1 will be added to the Clinical matrix as the intercept.
`max.steps`	the number of MCMC iterations.
`robust`	logical flag. If TRUE, robust methods will be used.
`sparse`	logical flag. If TRUE, spike-and-slab priors will be used to shrink coefficients of irrelevant covariates to zero exactly.
`penalty`	two choices are available. "lasso" for lasso penalty. "elastic net" for elastic net penalty.
`debugging`	logical flag. If TRUE, progress will be output to the console and extra information will be returned.

Details

Consider the data model described in "dat":

$Y_{i} = \alpha_{0} + \sum_{k=1}^{q}\gamma_{k}C_{ik}+\sum_{j=1}^{p}\beta_{j}X_{ij}+\epsilon_{i},$

where $\alpha_{0}$ is the intercept, $\gamma_{k}$ 's and $\beta_{j}$ 's are the regression coefficients corresponding to effects of clinical factors and genetic variants, respectively.

When penalty="elastic net" (default), the elastic net penalty is adopted. If penalty="lasso", the lasso penalty is used.

When sparse=TRUE (default), spike–and–slab priors are imposed to identify important main and interaction effects. If sparse=FALSE, Laplacian shrinkage will be used.

When robust=TRUE (default), the distribution of $\epsilon_{i}$ is defined as a Laplace distribution with density $f(\epsilon_{i}|\nu) = \frac{\nu}{2}\exp\left\{-\nu |\epsilon_{i}|\right\}$ , ( $i=1,\dots,n$ ), which leads to a Bayesian formulation of LAD regression. If robust=FALSE, $\epsilon_{i}$ follows a normal distribution.

Both $X$ and $clin$ will be standardized before the generation of interaction terms to avoid the multicollinearity between main effects and interaction terms.

Please check the references for more details about the prior distributions.

Value

an object of class ‘Bayenet’ is returned, which is a list with component:

`posterior`	the posterior samples of coefficients from the MCMC.
`coefficient`	the estimated value of coefficients.
`burn.in`	the total number of burn-ins.
`iterations`	the total number of iterations.
`design`	the design matrix of all effects.

References

Lu, X. and Wu, C. (2023). Bayesian quantile elastic net with spike-and-slab priors.

Examples

data(dat)

max.steps=5000
fit= Bayenet(X, Y, clin, max.steps, penalty="lasso")

## coefficients of parameters
fit$coefficient

## Estimated values of main G effects 
fit$coefficient$G

## Estimated values of clincal effects 
fit$coefficient$clin


data(dat)

max.steps=5000
fit= Bayenet(X, Y, clin, max.steps, penalty="lasso")

## coefficients of parameters
fit$coefficient

## Estimated values of main G effects 
fit$coefficient$G

## Estimated values of clincal effects 
fit$coefficient$clin

simulated data for demonstrating the features of Bayenet.

Description

Simulated gene expression data for demonstrating the features of Bayenet.

Usage

data("dat")
data("dat")

Format

dat consists of four components: X, Y, clin, coef.

Details

The data model for generating Y

Use subscript $i$ to denote the $i$ th subject. Let $(Y_{i}, X_{i}, clin_{i})$ ( $i=1,\ldots,n$ ) be independent and identically distributed random vectors. $Y_{i}$ is a continuous response variable representing the cancer outcome and disease phenotype. $X_{i}$ is the $p$ –dimensional vector of genetic factors. The clinical factors is denoted as the $q$ -dimensional vector $clin_{i}$ . The $\epsilon$ follows some heavy-tailed distribution. Considering the following model:

$Y_{i} = \alpha_{0} + \sum_{k=1}^{q}\gamma_{k}C_{ik}+\sum_{j=1}^{p}\beta_{j}X_{ij}+\epsilon_{i},$

where $\alpha_{0}$ is the intercept, $\gamma_{k}$ 's and $\beta_{j}$ 's are the regression coefficients corresponding to effects of clinical factors and genetic variants, respectively. Denote $\gamma=(\gamma_{1}, \ldots, \gamma_{q})^{T}$ , $\beta=(\beta_{1}, \ldots, \beta_{p})^{T}$ . Then model can be written as

$Y_{i} = C_{i}\gamma + X_{i}\beta + \epsilon_{i}.$

Examples

data(dat)
dim(X)
data(dat)
dim(X)

make predictions from a Bayenet object

Description

make predictions from a Bayenet object

Usage

## S3 method for class 'Bayenet'
predict(object, X.new, clin.new, Y.new, ...)
## S3 method for class 'Bayenet'
predict(object, X.new, clin.new, Y.new, ...)

Arguments

`object`	Bayenet object.
`X.new`	a matrix of new values for X at which predictions are to be made.
`clin.new`	a vector or matrix of new values for clin at which predictions are to be made.
`Y.new`	a vector of the response of new observations. If provided, the prediction error will be computed based on Y.new.
`...`	other predict arguments

Details

X.new must have the same number of columns as X used for fitting the model. If clin was provided when fit the model, clin.new must not be NULL, and vice versa. The predictions are made based on the posterior estimates of coefficients in the Bayenet object. Note that the effects of clinical factors are not subject to selection.

If Y.new is provided, the prediction error will be computed. For robust methods, the prediction mean absolute deviations (PMAD) will be computed. For non-robust methods, the prediction mean squared error (PMSE) will be computed.

Value

an object of class ‘Bayenet.pred’ is returned, which is a list with components:

`error`	prediction error. error is NULL is Y.new=NULL.
`y.pred`	predicted values of the new observations.

Examples

data(dat)
test=sample((1:nrow(X)), floor(nrow(X)/5))
fit=Bayenet(X[-test,], Y[-test], clin[-test,], max.steps=500,penalty="lasso")
predict(fit, X[test,], clin[test,], Y[test,])

data(dat)
test=sample((1:nrow(X)), floor(nrow(X)/5))
fit=Bayenet(X[-test,], Y[-test], clin[-test,], max.steps=500,penalty="lasso")
predict(fit, X[test,], clin[test,], Y[test,])

print a Bayenet object

Description

Print a summary of a Bayenet object

Usage

## S3 method for class 'Bayenet'
print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'Bayenet'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

`x`	Bayenet object.
`digits`	significant digits in printout.
`...`	other print arguments.

Value

No return value, called for side effects.

print a predict.Bayenet object

Description

Print a summary of a predict.Bayenet object

Usage

## S3 method for class 'Bayenet.pred'
print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'Bayenet.pred'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

`x`	predict.Bayenet object.
`digits`	significant digits in printout.
`...`	other print arguments.

Value

No return value, called for side effects.

print a Selection object

Description

Print a summary of a Selection object

Usage

## S3 method for class 'Selection'
print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'Selection'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

`x`	Selection object.
`digits`	significant digits in printout.
`...`	other print arguments

Value

No return value, called for side effects.

Variable selection for a Bayenet object

Description

Variable selection for a Bayenet object

Usage

Selection(obj, sparse)
Selection(obj, sparse)

Arguments

`obj`	Bayenet object.
`sparse`	logical flag. If TRUE, spike-and-slab priors will be used to shrink coefficients of irrelevant covariates to zero exactly.

Details

For class ‘Sparse’, the inclusion probability is used to indicate the importance of predictors. Here we use a binary indicator $\phi$ to denote the membership of the non-spike distribution. Take the main effect of the $j$ th genetic factor, $X_{j}$ , as an example. Suppose we have collected H posterior samples from MCMC after burn-ins. The $j$ th G factor is included in the final model at the $j$ th MCMC iteration if the corresponding indicator is 1, i.e., $\phi_j^{(h)} = 1$ . Subsequently, the posterior probability of retaining the $j$ th genetic main effect in the final model is defined as the average of all the indicators for the $j$ th G factor among the H posterior samples. That is, $p_j = \hat{\pi} (\phi_j = 1|y) = \frac{1}{H} \sum_{h=1}^{H} \phi_j^{(h)}, \; j = 1, \dots,p.$ A larger posterior inclusion probability of $j$ th indicates a stronger empirical evidence that the $j$ th genetic main effect has a non-zero coefficient, i.e., a stronger association with the phenotypic trait. Here, we use 0.5 as a cutting-off point. If $p_j > 0.5$ , then the $j$ th genetic main effect is included in the final model. Otherwise, the $j$ th genetic main effect is excluded in the final model. For class ‘NonSparse’, variable selection is based on 95% credible interval. Please check the references for more details about the variable selection.

Value

an object of class ‘Selection’ is returned, which is a list with components:

`method`	method used for identifying important effects.
`effects`	a list of indicators of selected effects.

References

Lu, X. and Wu, C. (2023). Bayesian quantile elastic net with spike-and-slab priors.

Examples

data(dat)
max.steps=5000
fit= Bayenet(X, Y, clin, max.steps, penalty="lasso")
selected=Selection(fit,sparse=TRUE)
selected$Main.G


data(dat)
max.steps=5000
fit= Bayenet(X, Y, clin, max.steps, penalty="lasso")
selected=Selection(fit,sparse=TRUE)
selected$Main.G

Package 'Bayenet'

Help Index

Bayesian Quantile Elastic Net for Genetic Study

Description

Details

References

See Also

fit a robust Bayesian elastic net variable selection model for genetic study.

Description

Usage

Arguments

Details

Value

References

See Also

Examples

simulated data for demonstrating the features of Bayenet.

Description

Usage

Format

Details

See Also

Examples

make predictions from a Bayenet object

Description

Usage

Arguments

Details

Value

See Also

Examples

print a Bayenet object

Description

Usage

Arguments

Value

See Also

print a predict.Bayenet object

Description

Usage

Arguments

Value

See Also

print a Selection object

Description

Usage

Arguments

Value

See Also

Variable selection for a Bayenet object

Description

Usage

Arguments

Details

Value

References

See Also

Examples