| Title: | Box-Cox Symmetric Regression for Non-Negative Data |
|---|---|
| Description: | A collection of tools for regression analysis of non-negative data, including strictly positive and zero-inflated observations, based on the class of the Box-Cox symmetric (BCS) distributions and its zero-adjusted extension. The BCS distributions are a class of flexible probability models capable of describing different levels of skewness and tail-heaviness. The package offers a comprehensive regression modeling framework, including estimation and tools for evaluating goodness-of-fit. |
| Authors: | Felipe Queiroz [aut] (ORCID: <https://orcid.org/0000-0001-8368-0707>), Rodrigo Medeiros [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-8101-3038>) |
| Maintainer: | Rodrigo Medeiros <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.0 |
| Built: | 2026-05-14 08:18:42 UTC |
| Source: | https://github.com/ffqueiroz/bcsreg |
Density function, distribution function, quantile function, and random generation for the class of the Box-Cox symmetric (BCS) distributions.
dBCS(x, mu, sigma, lambda, zeta, family = "NO", log = FALSE) pBCS( q, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) qBCS( p, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) rBCS(n, mu, sigma, lambda, zeta, family = "NO")dBCS(x, mu, sigma, lambda, zeta, family = "NO", log = FALSE) pBCS( q, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) qBCS( p, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) rBCS(n, mu, sigma, lambda, zeta, family = "NO")
x, q
|
vector of positive quantiles. |
mu |
vector of strictly positive scale parameters. |
sigma |
vector of strictly positive relative dispersion parameters. |
lambda |
vector of real-valued skewness parameters. If |
zeta |
strictly positive extra parameter. It must be specified with only one value in cases where the BCS distribution has an extra parameter. See “Details” below. |
family |
a character that specifies the symmetric generating family of the BCS distribution.
Available options are: |
log, log.p
|
logical; if |
lower.tail |
logical; if |
p |
vector of probabilities. |
n |
number of observations. If 'n' is a vector, its length is used as the number of required observations. |
The class of the BCS distributions was introduced by Ferrari and Fumes (2017). It consists of a broad class of probability models for positive continuous data, which includes flexible distributions with different levels of skewness and tail-heaviness.
The BCS class includes, as special cases, the Box-Cox t (Rigby and Stasinopoulos, 2006), Box-Cox normal (or Box-Cox Cole-Green; Cole and Green, 1992), Box-Cox power exponential (Rigby and Stasinopoulos, 2004) distributions, as well as the log-symmetric distributions (Vanegas and Paula, 2016).
Let be a positive continuous random variable with a BCS distribution
with parameters , , and
and density generating function . The probability density function of
is given by
where
satisfies , and
.
The function is called density generating function, and it specifies the
generating symmetric family of within the class of the BCS probability
models. This function can also depend on extra parameters, such as the Box-Cox t and
Box-Cox power exponential distributions. We call these extra parameters
zeta. The currently available BCS distributions in the BCSreg package
are listed below:
| Distribution | Family abbreviation | N. of extra parameters |
| Box-Cox Hyperbolic | "HP" |
1 |
| Box-Cox Type I Logistic | "LOI" |
0 |
| Box-Cox Type II Logistic | "LOII" |
0 |
| Box-Cox Normal | "NO" |
0 |
| Box-Cox Power Exponential | "PE" |
1 |
| Box-Cox Sinh-Normal | "SN" |
1 |
| Box-Cox Slash | "SL" |
1 |
| Box-Cox t | "ST" |
1 |
dBCS returns the density function, pBCS gives the cumulative distribution function,
qBCS provides the quantile function, and rBCS generates random variables.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Cole, T., and Green, P.J. (1992). Smoothing reference centile curves: the LMS method and penalized likelihood. Statistics in medicine, 11, 1305-1319.
Ferrari, S. L., and Fumes, G. (2017). Box-Cox symmetric distributions and applications to nutritional data. AStA Advances in Statistical Analysis, 101, 321-344.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
Rigby, R. A., and Stasinopoulos, D. M. (2004). Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution. Statistics in medicine, 23, 3053-3076.
Rigby, R. A., and Stasinopoulos, D. M. (2006). Using the Box-Cox t distribution in GAMLSS to model skewness and kurtosis. Statistical Modelling, 6, 209-229.
Vanegas, L. H., and Paula, G. A. (2016). Log-symmetric distributions: statistical properties and parameter estimation. Brazilian Journal of Probability and Statistics, 30, 196-220.
ZABCS to access the density function, distribution
function, quantile function, and a random number generator for the
zero-adjusted BCS distributions. BCSreg for estimating the
parameters of a BCS regression model.
# Probability density function ## Right-skewed distributions curve(dBCS(x, 3, 0.3, -1.5, family = "NO"), xlim = c(0, 7), ylim = c(0, 0.7), ylab = "Density") curve(dBCS(x, 3, 0.3, -1.5, 4, family = "ST"), add = TRUE, col = 2) curve(dBCS(x, 3, 0.3, -1.5, 5, family = "PE"), add = TRUE, col = 4) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Truncated symmetric distributions (with support on (0, Inf)) curve(dBCS(x, 3, 0.3, 1, family = "NO"), xlim = c(0, 7), ylim = c(0, 0.7), ylab = "Density") curve(dBCS(x, 3, 0.3, 1, 4, family = "ST"), add = TRUE, col = 2) curve(dBCS(x, 3, 0.3, 1, 5, family = "PE"), add = TRUE, col = 4) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Left-skewed distributions curve(dBCS(x, 3, 0.3, 3, family = "NO"), xlim = c(0, 7), ylim = c(0, 0.7), ylab = "Density") curve(dBCS(x, 3, 0.3, 3, 4, family = "ST"), add = TRUE, col = 2) curve(dBCS(x, 3, 0.3, 3, 5, family = "PE"), add = TRUE, col = 4) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) # Random generation ## Parameter setting mu <- 5 # scale parameter sigma <- 0.2 # relative dispersion parameter lambda <- -0.2 # skewness parameter ## Generating family family <- "NO" ## Visualization x <- rBCS(10000, mu, sigma, lambda, family = family) hist(x, prob = TRUE, col = "white", main = "") curve(dBCS(x, mu, sigma, lambda, family = family), col = "blue", add = TRUE) plot(ecdf(x), main = "") curve(pBCS(x, mu, sigma, lambda, family = family), col = "blue", add = TRUE)# Probability density function ## Right-skewed distributions curve(dBCS(x, 3, 0.3, -1.5, family = "NO"), xlim = c(0, 7), ylim = c(0, 0.7), ylab = "Density") curve(dBCS(x, 3, 0.3, -1.5, 4, family = "ST"), add = TRUE, col = 2) curve(dBCS(x, 3, 0.3, -1.5, 5, family = "PE"), add = TRUE, col = 4) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Truncated symmetric distributions (with support on (0, Inf)) curve(dBCS(x, 3, 0.3, 1, family = "NO"), xlim = c(0, 7), ylim = c(0, 0.7), ylab = "Density") curve(dBCS(x, 3, 0.3, 1, 4, family = "ST"), add = TRUE, col = 2) curve(dBCS(x, 3, 0.3, 1, 5, family = "PE"), add = TRUE, col = 4) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Left-skewed distributions curve(dBCS(x, 3, 0.3, 3, family = "NO"), xlim = c(0, 7), ylim = c(0, 0.7), ylab = "Density") curve(dBCS(x, 3, 0.3, 3, 4, family = "ST"), add = TRUE, col = 2) curve(dBCS(x, 3, 0.3, 3, 5, family = "PE"), add = TRUE, col = 4) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) # Random generation ## Parameter setting mu <- 5 # scale parameter sigma <- 0.2 # relative dispersion parameter lambda <- -0.2 # skewness parameter ## Generating family family <- "NO" ## Visualization x <- rBCS(10000, mu, sigma, lambda, family = family) hist(x, prob = TRUE, col = "white", main = "") curve(dBCS(x, mu, sigma, lambda, family = family), col = "blue", add = TRUE) plot(ecdf(x), main = "") curve(pBCS(x, mu, sigma, lambda, family = family), col = "blue", add = TRUE)
Fit a Box-Cox symmetric (BCS) or a zero-adjusted BCS regression model using maximum likelihood estimation.
BCSreg( formula, data, subset, na.action, family = "NO", zeta, link = "log", sigma.link = "log", alpha.link, control = BCSreg.control(...), model = FALSE, y = FALSE, x = FALSE, ... ) ## S3 method for class 'BCSreg' print(x, digits = max(3, getOption("digits") - 3), ...)BCSreg( formula, data, subset, na.action, family = "NO", zeta, link = "log", sigma.link = "log", alpha.link, control = BCSreg.control(...), model = FALSE, y = FALSE, x = FALSE, ... ) ## S3 method for class 'BCSreg' print(x, digits = max(3, getOption("digits") - 3), ...)
formula |
a symbolic description of the model, allowing the specification of
different regression structures for model parameters using the
See "Details" for further explanation. |
data, subset, na.action
|
arguments controlling formula processing via
|
family |
a character that specifies the symmetric generating family of the BCS distribution.
Available options are: |
zeta |
Strictly positive extra parameter. It must be specified with a single value in cases where the BCS distribution has an extra parameter. |
link, sigma.link
|
character specification of the link functions for
the scale and relative dispersion regression structures, respectively.
Currently, |
alpha.link |
character specification of the link function for the
zero-adjustment regression structure (only applicable for zero-inflated positive data). Currently, |
control |
a list of control parameters passed as arguments for
the |
model, y, x
|
logicals. If |
... |
arguments passed to |
digits |
a non-null value for digits specifies the minimum number of significant digits to be printed in values. |
The BCSreg function implements maximum likelihood estimation in the
class of the BCS and zero-adjusted BCS regression models for the analysis of positive
or zero-inflated data (Medeiros and Queiroz, 2025). The BCS distributions
(Ferrari and Fumes, 2017) are a broad class of flexible distributions that achieve different
levels of skewness and tail-heaviness. See details in BCS.
The distributions currently implemented in the BCSreg package, along
with their abbreviations used in the family argument, are listed below:
| Distribution | Family abbreviation | N. of extra parameters |
| Box-Cox Hyperbolic | "HP" |
1 |
| Box-Cox Type I Logistic | "LOI" |
0 |
| Box-Cox Type II Logistic | "LOII" |
0 |
| Box-Cox Normal | "NO" |
0 |
| Box-Cox Power Exponential | "PE" |
1 |
| Box-Cox Sinh-Normal | "SN" |
1 |
| Box-Cox Slash | "SL" |
1 |
| Box-Cox t | "ST" |
1 |
The BCS distributions have at least three parameters: scale (mu),
relative dispersion (sigma), and skewness (lambda) parameters.
Some distributions may also depend on an additional parameter (zeta),
such as the Box-Cox t and Box-Cox power exponential distributions. The
BCS distributions reduce to the log-symmetric distributions
(Vanegas and Paula, 2016) when lambda is fixed at zero. The log-symmetric
distributions are an important class of probability models for positive data,
which includes well-known distributions such as the log-normal and log-t
distributions. The BCSreg function allows fitting a log-symmetric
regression using lambda = 0 as an argument
(see BCSreg.control).
The BCSreg function also implements the class of zero-adjusted BCS regression models,
for analyzing mixed data that is positive but can assume zero. In this case, the
zero-adjusted BCS model corresponding to the family is fitted to the data. In
addition to the parameters mu, sigma, lambda, and possibly zeta,
the zero-adjusted BCS distributions have a zero-adjustment parameter (alpha) that
corresponds to the probability of observing a zero response. This parameter can also be
modeled with a regression structure.
The formula argument defines the regression structures for different model
parameters using the Formula package (Zeileis and Croissant, 2010).
It can have up to three parts, separated by the "|" operator:
First part: specifies the model for the scale parameter.
Second part (optional): defines a regression structure for the relative dispersion parameter.
Third part (only applicable for zero-inflated positive data): models the zero-adjustment parameter.
If only the first part is provided, the model applies only to the scale parameter. When a second part is included, a regression structure is defined for the relative dispersion parameter. If the data contain zero inflation, a third part can be specified to model the zero-adjustment parameter. If the third part is provided but the response variable contains no zeros, this part is ignored.
For instance, consider a dataset where y is the zero-inflated
dependent variable, and x, s, and z are explanatory
variables associated with the scale, relative dispersion, and zero-adjustment
parameters, respectively. The following formulas illustrate different model structures:
y ~ x # Scale parameter only
y ~ x | s # Scale and relative dispersion parameters
y ~ x | s | z # Scale, relative dispersion, and zero-adjustment parameters
y ~ x | 1 | z # Scale and zero-adjustment parameters
y ~ 1 | s | z # Relative dispersion and zero-adjustment parameters
The BCSreg function returns an object of class "BCSreg",
which consists of a list with the following components:
a list containing the elements "mu" and
"sigma" that consist of the estimates of the coefficients
associated with the scale and relative dispersion regression structures,
respectively. If the model is zero-adjusted, the element "alpha"
will also be returned with the estimated coefficients for the regression
structure of the zero-adjustment parameter.
a vector with the fitted median responses. Not to be
confused with the fitted values for mu.
a vector with the fitted scale parameters (or conditional scale parameters for a zero-adjusted fit).
a vector with the fitted relative dispersion parameters (or conditional relative dispersion parameters for a zero-adjusted fit).
the maximum likelihood estimate of the skewness parameter (lambda), or its fixed value
specified in the BCSreg function.
the specified value for the extra parameter of the corresponding BCS distribution, if applicable.
the generating family of the fitted BCS distribution.
a list with elements "mu" and "sigma" with the
specified link functions for the mu and sigma regression
structures, respectively. If the model is zero-adjusted, the element
"alpha" will also be returned with the link function for
the regression structure of the zero-adjustment parameter.
log-likelihood of the fitted model.
asymptotic covariance matrix of the estimators. By default, the asymptotic
covariance matrix is based on a analytical expression of the observed information matrix.
It can be obtained numerically based on the Hessian matrix via optim
if the argument hessian = TRUE is used in the BCSreg function.
number of observations.
residual degrees of freedom in the null model (a model without any regression structure).
residual degrees of freedom in the fitted model, that is, the sample size minus the number of model parameters.
the control arguments passed to the optim call.
a vector with the starting values used in the iterative process.
a list with the output from optim.
logical indicating successful convergence of the iterative process.
the original function call.
the formula used.
a list with elements "mu", "sigma", and "full" containing
the term objects for the respective models. If the model is zero-adjusted,
the element "alpha" will also be returned with the term object for the
zero-adjustment model.
a list with elements "mu", "sigma", and "full" containing
the levels of the categorical regressors. If the model is zero-adjusted, the element
"alpha" will also be returned.
a list with elements "mu" and "sigma"
containing the contrasts corresponding to levels from the respective models. If the model is zero-adjusted, the element
"alpha" will also be returned.
the full model frame (if y = TRUE).
the response variable (if y = TRUE).
a list with elements "mu" and "sigma" with the model matrices from
the mu and sigma submodels (if x = TRUE). If the model is zero-adjusted, the element
"alpha" will also be returned with the model matrix for the
zero-adjustment submodel.
a vector with the fitted zero-adjustment parameters when a zero-adjusted
model is considered; and NULL, otherwise.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34, 1—24
Ferrari, S. L. P., and Fumes, G. (2017). Box-Cox symmetric distributions and applications to nutritional data. AStA Advances in Statistical Analysis, 101, 321—344.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
Vanegas, L. H., and Paula, G. A. (2016). Log-symmetric distributions: statistical properties and parameter estimation. Brazilian Journal of Probability and Statistics, 30,196—220
Zeileis A, Croissant Y (2010). Extended Model Formulas in R: Multiple Parts and Multiple Responses. Journal of Statistical Software, 34, 1—13.
summary.BCSreg for more detailed summaries,
residuals.BCSreg to extract residuals from the fitted model,
plot.BCSreg for diagnostic plots, and
extra.parameter for selection of the extra parameter zeta via Upsilon goodness-of-fit statistic and profile likelihood.
Information on additional methods for "BCSreg" objects can be found at BCSreg-methods.
# BCS regression for strictly positive response variables ## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit examples ### Fit a single Box-Cox normal regression model: fit_bcno1 <- BCSreg(cpue ~ location + tide_phase + max_temp, raycatch) fit_bcno1 ### Fit a double Box-Cox normal regression model: fit_bcno2 <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) fit_bcno2 ### Fit a double Box-Cox power exponential regression model (family = "PE"): fit_bcpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4) fit_bcpe ### Fit a double log-power exponential regression model (lambda = 0): fit_lpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4, lambda = 0) fit_lpe # Zero-adjusted BCS (ZABCS) regression for nonnegative response variables ## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Fit examples ### Fit a single zero-adjusted Box-Cox normal regression model: fit_zabcno1 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land, renewables2015) fit_zabcno1 ### Fit a double zero-adjusted Box-Cox normal regression model: fit_zabcno2 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) fit_zabcno2 ### Fit a triple zero-adjusted Box-Cox normal regression model: fit_zabcno3 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) fit_zabcno3 ### Fit a triple zero-adjusted Box-Cox power exponential regression model (family = "PE"): fit_zabcpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4) fit_zabcpe ### Fit a triple zero-adjusted log-power exponential regression model (lambda = 0): fit_zalpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4, lambda = 0) fit_zalpe# BCS regression for strictly positive response variables ## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit examples ### Fit a single Box-Cox normal regression model: fit_bcno1 <- BCSreg(cpue ~ location + tide_phase + max_temp, raycatch) fit_bcno1 ### Fit a double Box-Cox normal regression model: fit_bcno2 <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) fit_bcno2 ### Fit a double Box-Cox power exponential regression model (family = "PE"): fit_bcpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4) fit_bcpe ### Fit a double log-power exponential regression model (lambda = 0): fit_lpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4, lambda = 0) fit_lpe # Zero-adjusted BCS (ZABCS) regression for nonnegative response variables ## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Fit examples ### Fit a single zero-adjusted Box-Cox normal regression model: fit_zabcno1 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land, renewables2015) fit_zabcno1 ### Fit a double zero-adjusted Box-Cox normal regression model: fit_zabcno2 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) fit_zabcno2 ### Fit a triple zero-adjusted Box-Cox normal regression model: fit_zabcno3 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) fit_zabcno3 ### Fit a triple zero-adjusted Box-Cox power exponential regression model (family = "PE"): fit_zabcpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4) fit_zabcpe ### Fit a triple zero-adjusted log-power exponential regression model (lambda = 0): fit_zalpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4, lambda = 0) fit_zalpe
Methods for "BCSreg" objects.
## S3 method for class 'BCSreg' model.frame(formula, ...) ## S3 method for class 'BCSreg' model.matrix(object, model = c("mu", "sigma", "alpha"), ...) ## S3 method for class 'BCSreg' coef(object, model = c("mu", "sigma", "alpha", "full"), ...) ## S3 method for class 'BCSreg' vcov(object, model = c("mu", "sigma", "alpha", "full"), ...) ## S3 method for class 'BCSreg' logLik(object, ...) ## S3 method for class 'ugrpl' AIC(object, ..., k = 2)## S3 method for class 'BCSreg' model.frame(formula, ...) ## S3 method for class 'BCSreg' model.matrix(object, model = c("mu", "sigma", "alpha"), ...) ## S3 method for class 'BCSreg' coef(object, model = c("mu", "sigma", "alpha", "full"), ...) ## S3 method for class 'BCSreg' vcov(object, model = c("mu", "sigma", "alpha", "full"), ...) ## S3 method for class 'BCSreg' logLik(object, ...) ## S3 method for class 'ugrpl' AIC(object, ..., k = 2)
formula |
a model formula or terms object or an R object. |
... |
further arguments passed to or from other methods. |
object |
an object of class |
model |
a character indicating which regression structure should be used.
It can be |
k |
numeric, the penalty per parameter to be used; the default
|
model.frame returns a data.frame containing the variables required
by formula and any additional arguments provided via ....
model.matrix returns the design matrix used in the regression structure,
as specified by the model argument.
coef returns a numeric vector of estimated regression coefficients, based
on the model argument. If parm = "full", it returns a list with the
components "mu" and "sigma", each containing the corresponding
coefficient estimates. If the model is zero-adjusted, it will also have a
"alpha" component with the estimates of the associated regression coefficients.
vcov returns the asymptotic covariance matrix of the regression coefficients,
based on the model argument.
logLik returns the log-likelihood value of the fitted model.
AIC returns a numeric value representing the Akaike Information Criterion
(AIC), Bayesian Information Criterion, or another criterion, depending on k.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Ferrari, S. L. P., and Fumes, G. (2017). Box-Cox symmetric distributions and applications to nutritional data. AStA Advances in Statistical Analysis, 101, 321—344.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Fit a zero-adjusted Box-Cox normal regression fit <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) ## coef coef(fit) # regression coefficients of the scale submodel coef(fit, model = "sigma") # regression coefficients of the relative dispersion submodel coef(fit, model = "alpha") # regression coefficients of the zero-adjustment submodel coef(fit, model = "full") # all regression coefficients ## vcov vcov(fit) # covariance matrix for the scale submodel coefficients vcov(fit, model = "sigma") # covariance matrix for the relative dispersion submodel coefficients vcov(fit, model = "alpha") # covariance matrix for the zero-adjustment submodel coefficients vcov(fit, model = "full") # full covariance matrix of the model (including the skewness parameter) ## Log-likelihood value logLik(fit) ## AIC and BIC AIC(fit) AIC(fit, k = log(fit$nobs)) ## Model matrices model.matrix(fit) # design matrix for the scale submodel model.matrix(fit, model = "sigma") # design matrix for the relative dispersion submodel model.matrix(fit, model = "alpha") # design matrix for the zero-adjustment submodel## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Fit a zero-adjusted Box-Cox normal regression fit <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) ## coef coef(fit) # regression coefficients of the scale submodel coef(fit, model = "sigma") # regression coefficients of the relative dispersion submodel coef(fit, model = "alpha") # regression coefficients of the zero-adjustment submodel coef(fit, model = "full") # all regression coefficients ## vcov vcov(fit) # covariance matrix for the scale submodel coefficients vcov(fit, model = "sigma") # covariance matrix for the relative dispersion submodel coefficients vcov(fit, model = "alpha") # covariance matrix for the zero-adjustment submodel coefficients vcov(fit, model = "full") # full covariance matrix of the model (including the skewness parameter) ## Log-likelihood value logLik(fit) ## AIC and BIC AIC(fit) AIC(fit, k = log(fit$nobs)) ## Model matrices model.matrix(fit) # design matrix for the scale submodel model.matrix(fit, model = "sigma") # design matrix for the relative dispersion submodel model.matrix(fit, model = "alpha") # design matrix for the zero-adjustment submodel
Optimization parameters that control the fitting of Box-Cox symmetric or
zero-adjusted Box-Cox symmetric regression
models using the BCSreg function.
BCSreg.control( lambda = NULL, method = "BFGS", maxit = 2000, hessian = FALSE, trace = FALSE, start = NULL, ... )BCSreg.control( lambda = NULL, method = "BFGS", maxit = 2000, hessian = FALSE, trace = FALSE, start = NULL, ... )
lambda |
numeric indicating the value of lambda (if |
method |
character specifying the |
maxit, trace, ...
|
arguments passed to |
hessian |
logical. Should the numerical Hessian matrix from the |
start |
an optional vector with starting values for the regression coefficients
associated with the |
The BCSreg.control controls the fitting process of Box-Cox symmetric models.
Almost all the arguments are passed directly to optim, which
is used to estimate the parameters. Starting values for the regression coefficients
associated with the mu and sigma submodels can be provided via start
argument. The starting value for the skewness parameter lambda is zero; that is,
the fitting process starts from the corresponding log-symmetric regression model. If the
estimation process is to be performed with a fixed lambda, a value must be specified
for the lambda argument. A natural value is lambda = 0, where a log-symmetric
regression model will be estimated.
When a regression fit with a zero-adjusted BCS distribution is performed, the regression
coefficients associated with the zero-adjustment parameter are estimated
using the glm function.
A list with components named as the arguments.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, xlab = "Tide phase", ylab = "Catch per unit effort") ### Fit a Box-Cox normal regression model: fit_bcno <- BCSreg(cpue ~ tide_phase, raycatch) fit_bcno ### Fit a log-normal regression model (setting lambda = 0): fit_lno <- BCSreg(cpue ~ tide_phase, raycatch, lambda = 0) fit_lno ### Standard errors obtained from the numerical Hessian matrix ### provided by optim instead of the analytical solution: fit_bcno2 <- BCSreg(cpue ~ tide_phase, raycatch, hessian = TRUE) fit_lno2 <- BCSreg(cpue ~ tide_phase, raycatch, lambda = 0, hessian = TRUE) # In this case, there are differences only in the eighth decimal place: vcov(fit_bcno) vcov(fit_bcno2) # In this case, there are no differences: vcov(fit_lno) vcov(fit_lno2)## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, xlab = "Tide phase", ylab = "Catch per unit effort") ### Fit a Box-Cox normal regression model: fit_bcno <- BCSreg(cpue ~ tide_phase, raycatch) fit_bcno ### Fit a log-normal regression model (setting lambda = 0): fit_lno <- BCSreg(cpue ~ tide_phase, raycatch, lambda = 0) fit_lno ### Standard errors obtained from the numerical Hessian matrix ### provided by optim instead of the analytical solution: fit_bcno2 <- BCSreg(cpue ~ tide_phase, raycatch, hessian = TRUE) fit_lno2 <- BCSreg(cpue ~ tide_phase, raycatch, lambda = 0, hessian = TRUE) # In this case, there are differences only in the eighth decimal place: vcov(fit_bcno) vcov(fit_bcno2) # In this case, there are no differences: vcov(fit_lno) vcov(fit_lno2)
A dataset containing household-level information on expenditures on basic education and associated socioeconomic characteristics, derived from the 2017–2018 Brazilian Consumer Expenditure Survey (Pesquisa de Orçamentos Familiares, POF), conducted by the Instituto Brasileiro de Geografia e Estatística (IBGE).
This dataset is used to illustrate regression modeling for zero-adjusted data with a substantial proportion of zero observations. The response variable corresponds to total household expenditure on basic education, measured over the 12 months preceding the interview and assigned to the household reference person. Expenditures include childcare, preschool, regular primary and secondary education, youth and adult education, and supplementary (equivalency) programs at the primary and secondary levels.
Households reporting no expenditure on basic education are assigned a value of zero. The final sample consists of 4,232 households residing in the state of São Paulo, Brazil. Approximately 93% of the observations correspond to zero expenditure, indicating a highly zero-inflated distribution.
data(education)data(education)
A data frame with 4,232 observations and 11 variables:
Total household expenditure on basic education (in BRL) over the 12 months preceding the interview. Values equal to zero indicate no reported expenditure.
Type of household residence, categorized as
"Urban" or "Rural".
Age of the household reference person, in years.
Sex of the household reference person, coded as
"Male" or "Female".
Self-reported race or ethnicity of the reference person, according to IBGE classification.
Indicator of whether the reference person is covered by
a private health plan ("Yes" or "No").
Literacy status of the reference person
("Yes" or "No").
Number of completed years of formal education of the reference person.
Highest educational level attained by the reference person.
Per capita disposable household income (in BRL), calculated as total disposable household income divided by the number of residents. Disposable income includes monetary and non-monetary earnings, net of direct taxes, social contributions, and other mandatory deductions.
Number of children living in the household, including children of the reference person and/or the spouse.
In each household, the POF survey designates a reference person, typically responsible for financial and administrative decisions. All individual-level covariates refer to this reference person. Expenditure values are aggregated at the household level but attributed to the reference person for modeling purposes.
Summary statistics indicate that the median education expenditure is zero, while the mean expenditure is BRL 108.8, reflecting the presence of a small number of households with substantially higher spending levels. The maximum observed expenditure is BRL 48,000.
Instituto Brasileiro de Geografia e Estatística (IBGE). Pesquisa de Orçamentos Familiares (POF) 2017–2018. Available (in Portuguese) at: https://www.ibge.gov.br/estatisticas/sociais/populacao/24786-pesquisa-de-orcamentos-familiares-2.html
data(education) summary(education)data(education) summary(education)
Produce the normal probability plot with simulated envelope of the quantile residuals obtained from a Box-Cox symmetric regression fit.
envelope(object, rep = 60, conf = 0.95, envcol, ...)envelope(object, rep = 60, conf = 0.95, envcol, ...)
object |
a fitted model object of class " |
rep |
a positive integer representing the number of iterations to calculate
the simulated envelopes. Default is |
conf |
a numeric value in the interval (0,1) that represents the confidence
level of the simulated envelope. Default is |
envcol |
character specifying the color of the envelope. |
... |
additional graphical parameters (see par). |
The envelope uses the idea of Atkinson (1985) to create normal
probability plots with simulated envelope. Under the correct model,
approximately 100 * conf of the residuals are expected to be inside
the envelope.
Currently, the envelope() function, when used in a zero-adjusted Box-Cox
symmetric regression fit, returns only the quantile plot for the quantile
residuals obtained under a combined approach (see residuals.BCSreg).
envelope returns normal probability plot with simulated envelopes
for the quantile residuals.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Atkinson, A. C. (1985). Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Oxford Science Publications, Oxford.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit a double Box-Cox normal regression model: fit <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) envelope(fit)## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit a double Box-Cox normal regression model: fit <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) envelope(fit)
Estimation of the extra parameter in a Box-Cox symmetric or zero-adjusted Box-Cox symmetric regression model based on the Upsilon goodness-of-fit statistic and the profile log-likelihood.
extra.parameter( object, family, grid = seq(1, 30, 2), trace = TRUE, plot = TRUE, control = BCSreg.control(...), ... ) ## S3 method for class 'extra.parameter' print(x, ...) ## S3 method for class 'extra.parameter' plot(x, which = 1, ...)extra.parameter( object, family, grid = seq(1, 30, 2), trace = TRUE, plot = TRUE, control = BCSreg.control(...), ... ) ## S3 method for class 'extra.parameter' print(x, ...) ## S3 method for class 'extra.parameter' plot(x, which = 1, ...)
object |
an object of class |
family |
a character string specifying the symmetric generating family of the
BCS distribution. The available options are: |
grid |
a numeric vector of positive values at which the Upsilon statistic and the profile log-likelihood function will be evaluated. |
trace |
logical; if |
plot |
logical; if |
control |
a list of control arguments specified via |
... |
further arguments passed to |
x |
an object of class |
which |
numeric; if |
An object of class "extra.parameter", which is a list containing the fits of the
BCS regression model for each value of the extra parameter zeta specified in grid.
The object also includes:
logLik: a vector with the log-likelihood values for each fit;
Upsilon: a vector with the Upsilon statistic values for each fit;
grid: the specified grid of values.
The value of the extra parameter (zeta) can be selected using two alternative
approaches:
the value that minimizes the Upsilon goodness-of-fit statistic;
the value that maximizes the log-likelihood.
The print method summarizes the fits by displaying, for each value in grid,
the corresponding log-likelihood value and Upsilon statistic.
The plot method returns a graph of the Upsilon statistics, highlighting its minimum.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit the Box-Cox normal regression as a reference model fit_bcno <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) ## Use the specifications of the reference model to change the distribution ## to, for example, Box-Cox t, and select the value of the extra parameter: select_bct <- extra.parameter(fit_bcno, family = "ST", grid = 1:20) ## Class class(select_bct) ## It is possible to recover the plots: plot(select_bct) plot(select_bct, which = 2) ## and the trace: select_bct ## Selected fit based on the Upsilon statistic fit_bct <- select_bct[["zeta = 19"]] summary(fit_bct)## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit the Box-Cox normal regression as a reference model fit_bcno <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) ## Use the specifications of the reference model to change the distribution ## to, for example, Box-Cox t, and select the value of the extra parameter: select_bct <- extra.parameter(fit_bcno, family = "ST", grid = 1:20) ## Class class(select_bct) ## It is possible to recover the plots: plot(select_bct) plot(select_bct, which = 2) ## and the trace: select_bct ## Selected fit based on the Upsilon statistic fit_bct <- select_bct[["zeta = 19"]] summary(fit_bct)
The influence function provides two influence measures for a Box-Cox symmetric or a
zero-adusted Box-Cox symmetric regression fit.
influence(object, plot = TRUE, ask = grDevices::dev.interactive(), ...)influence(object, plot = TRUE, ask = grDevices::dev.interactive(), ...)
object |
an object of class |
plot |
logical. If |
ask |
logical; if |
... |
currently not used. |
influence returns a list with two objects:
case.weights |
The values of |
totalLI |
The total local influence (see Lesaffre and Verbeke (1998)). |
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Lesaffre, E., and Verbeke, G. (1998). Local influence in linear mixed models. Biometrics, 570–582.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
BCSreg for parameter estimation in the class of the Box-Cox
symmetric or zero-adjusted Box-Cox symmetric regression models,
residuals.BCSreg for extracting residuals for a fitted model, and
plot.BCSreg for diagnostic plots.
## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit a double Box-Cox normal regression model: fit <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) ## Influence measures under case-weights perturbation scheme: cw <- influence(fit) ## two index plots are shown str(cw)## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit a double Box-Cox normal regression model: fit <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) ## Influence measures under case-weights perturbation scheme: cw <- influence(fit) ## two index plots are shown str(cw)
This function provides plots for diagnostic analysis of a Box-Cox symmetric or a zero-adjusted regression fit.
## S3 method for class 'BCSreg' plot( x, which = 1:4, ask = prod(graphics::par("mfcol")) < length(which) && grDevices::dev.interactive(), pch = "+", las = 1, cex = 0.8, lwd = 2, ... )## S3 method for class 'BCSreg' plot( x, which = 1:4, ask = prod(graphics::par("mfcol")) < length(which) && grDevices::dev.interactive(), pch = "+", las = 1, cex = 0.8, lwd = 2, ... )
x |
an object of class |
which |
numeric; if a subset of the plots is required, specify a subset
of the numbers |
ask |
logical; if |
pch, las, cex, lwd, ...
|
graphical parameters (see |
The plot method for BCSreg objects provides seven types
of diagnostic plots in the following order:
a plot of the residuals versus the fitted medians.
an index plot of the residuals versus the observation indices.
a graph that compares the empirical density of the residuals with the density of the standard normal distribution.
a normal probability plot of the residuals with a
confidence region constructed according to Fox (2016) using the
qqPlot function.
An index plot of local influence based on the case-weight perturbation scheme.
a dispersion diagram of the fitted values versus the observed values.
a dispersion diagram of the function
versus the residuals. For some BCS models, the function
may be interpreted as weights in the estimation process. If family = "NO",
the function is constant.
The which argument can be used to select a subset of the implemented plots.
Default is which = 1:4. See residuals.BCSreg for details on
the residuals.
plot method for "BCSreg" objects returns seven types
of diagnostic plots.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit a double Box-Cox normal regression model: fit <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) ## Available plots: ### Residuals vs fitted values (fitted medians) plot(fit, which = 1) ### Residuals vs observation indices plot(fit, which = 2) ### Density plot plot(fit, which = 3) ### Normal probability plot plot(fit, which = 4) ### Local influence plot(fit, which = 5) ### Fitted medians vs response plot(fit, which = 6) ### v(z) function plot(fit, which = 7)## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit a double Box-Cox normal regression model: fit <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) ## Available plots: ### Residuals vs fitted values (fitted medians) plot(fit, which = 1) ### Residuals vs observation indices plot(fit, which = 2) ### Density plot plot(fit, which = 3) ### Normal probability plot plot(fit, which = 4) ### Local influence plot(fit, which = 5) ### Fitted medians vs response plot(fit, which = 6) ### v(z) function plot(fit, which = 7)
This dataset contains information on white ray landings using the traditional "grozeira" fishing method in Baía de Todos os Santos, Brazil. Data were collected from January 2012 to January 2013 by Marion (2015). The study aimed to analyze the relationship between catch per unit effort (cpue) and various environmental factors.
data(raycatch)data(raycatch)
A data frame with 186 rows and 8 variables:
fishing period, categorized as "Dry" or "Rainy".
fishing location, categorized as "Area 1", "Area 2", "Area 3", or "Area 4".
wind speed (m/s).
tide phase, categorized as "Quadrature" or "Spring tide".
maximum temperature (°C).
minimum temperature (°C).
duration of sunshine (hours).
represents the catch per unit effort, calculated as the productivity (in grams) divided by the product of the number of hooks and the soak time (in hours).
In Brazil, marine fish are harvested through various fishing activities, with artisanal fishing
playing a particularly prominent role, especially in the state of Bahia, in the northeastern
region of the country. In this region, particularly within the Baía de Todos os Santos, most of
the marine production comes from small-scale operations. Among the species landed and sold,
rays constitute a significant portion of the catch. The data analyzed in this application were
collected by Marion (2015) over a 13-month period (January 2012 to January 2013), based on 231
fishing trips that employed the grozeira (a type of bottom longline) as gear. After eliminating
missing data, the data contains information on 186 fish landings. The objective is to identify
key factors influencing ray catch levels. The response variable is the catch per unit effort
(cpue), defined as
Data collected from a field study on traditional ray fishing in Baía de Todos os Santos, Brazil.
Marion, C. (2015). Função da Baía de Todos os Santos, Bahia, no ciclo de vida de Dasyatis guttata (Chondrichthyes: Dasyatidae). Doctoral dissertation, University of São Paulo, Institute of Oceanography. Retrieved from https://teses.usp.br/teses/disponiveis/21/21134/tde-22072015-154346/pt-br.php
Paula, G. A., and Kumagaia, G. H. Relatório de análise estatística sobre o projeto: "Variabilidade espaço-temporal da captura da raia branca, Dasyattis guttata, na pesca artesanal da Baía de Todos os Santos, Bahia". São Paulo, IME-USP, 2014.
data(raycatch) summary(raycatch) plot(cpue ~ tide_phase, raycatch, xlab = "Tide phase", ylab = "CPUE (kg)", main = "Effect of tide phase on CPUE")data(raycatch) summary(raycatch) plot(cpue ~ tide_phase, raycatch, xlab = "Tide phase", ylab = "CPUE (kg)", main = "Effect of tide phase on CPUE")
This dataset provides information on renewable electricity output and a set of socioeconomic, environmental, and institutional variables for 186 countries in the year 2015. The data were obtained from the World Bank (https://data.worldbank.org/).
data(renewables2015)data(renewables2015)
A data frame with 186 rows and 9 variables:
country_namecountry name.
adj_sav_eduadjusted savings: education expenditure (percent of GNI).
agri_landnatural logarithm of total agricultural land area (in sq. km).
elec_fossilelectricity production from fossil fuels (percent of total). Indicates the share of electricity generated from oil, gas, and coal, representing reliance on non-renewable energy.
forest_areaforest area (percent of land area). Reflects natural resource availability.
gov_effecgovernment effectiveness index (range: approximately -2.5 to 2.5). Measures public service quality, policy implementation, and government credibility.
pop_densitynatural logarithm of population density (people per sq. km).
access_elecaccess to electricity (percent of population).
renew_elec_outputrenewable electricity output (in TWh).
The response variable is the renewable electricity output, measured in terawatt-hours (TWh), representing the total electricity generated from renewable sources such as wind, solar photovoltaic, solar thermal, hydro, marine, geothermal, solid biofuels, renewable municipal waste, liquid biofuels, and biogas. Hydro pumped storage is excluded.
data(renewables2015) summary(renewables2015) pairs(renewables2015[, -1], pch = 16, cex = 0.8, col = rgb(0,0,0, 0.5))data(renewables2015) summary(renewables2015) pairs(renewables2015[, -1], pch = 16, cex = 0.8, col = rgb(0,0,0, 0.5))
Residuals resulting from fitting a Box-Cox symmetric or a zero-adjusted Box-Cox symmetric regression.
## S3 method for class 'BCSreg' residuals(object, approach = c("combined", "separated"), ...)## S3 method for class 'BCSreg' residuals(object, approach = c("combined", "separated"), ...)
object |
an object of class |
approach |
a character string indicating the approach for calculating residuals
when a zero-adjusted regression is fitted. Should be either |
... |
further arguments passed to or from other methods. |
For a Box-Cox symmetric regression fit, the residuals are the quantile residuals
(Dunn and Smyth, 1996), defined by ,
where is the fitted cumulative distribution function and
is cumulative distribution function of the standard normal distribution.
For zero-adjusted Box-Cox symmetric regressions, two approaches are available:
Combined approach: Returns a single vector of residuals defined as
where is a random variable uniformly distributed in
and is the fitted cumulative distribution function of the mixed response.
Separated approach: Returns a list containing:
Quantile residuals for the positive (continuous) component.
Standardized Pearson residuals for the discrete component, defined by
where is the th diagonal element of the
"hat matrix" resulting from a fit of a generalized linear model with a
binary response given by , being the indicator
function.
See more details in Medeiros and Queiroz (2025).
If a Box-Cox symmetric regression is fitted to the data, it returns a numeric vector containing the quantile residuals (Dunn and Smyth, 1996).
If the model is a zero-adjusted Box-Cox symmetric regression:
For approach = "combined", it returns a numeric vector with "combined" quantile residuals. See details
For approach = "separated", it returns a list with two components:
continuous (quantile residuals for strictly positive responses) and
discrete (standardized Pearson residuals for the discrete component).
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5, 236—244.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
# BCS regression for strictly positive response variables ## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## BCS fit fit <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch) ## Quantile residuals rq <- residuals(fit) rq ## Normal probability plot qqnorm(rq, pch = "+", cex = 0.8) qqline(rq, col = "dodgerblue", lwd = 2) # Zero-adjusted BCS (ZABCS) regression for nonnegative response variables ## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Zero-adjusted BCS fit fit0 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) ## Combined approach (default) rq <- residuals(fit0) rq ### Normal probability plot qqnorm(rq, pch = "+", cex = 0.8) qqline(rq, col = "dodgerblue", lwd = 2) ## Separated approach res <- residuals(fit0, approach = "separated") str(res) ### Normal probability plots # Continuous part qqnorm(res$continuous, pch = "+", cex = 0.8) qqline(res$continuous, col = "dodgerblue", lwd = 2) # Discrete part (Pearson's standardized residuals do not have a normal distribution.) qqnorm(res$discrete, pch = "+", cex = 0.8) qqline(res$discrete, col = "dodgerblue", lwd = 2)# BCS regression for strictly positive response variables ## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## BCS fit fit <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch) ## Quantile residuals rq <- residuals(fit) rq ## Normal probability plot qqnorm(rq, pch = "+", cex = 0.8) qqline(rq, col = "dodgerblue", lwd = 2) # Zero-adjusted BCS (ZABCS) regression for nonnegative response variables ## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Zero-adjusted BCS fit fit0 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) ## Combined approach (default) rq <- residuals(fit0) rq ### Normal probability plot qqnorm(rq, pch = "+", cex = 0.8) qqline(rq, col = "dodgerblue", lwd = 2) ## Separated approach res <- residuals(fit0, approach = "separated") str(res) ### Normal probability plots # Continuous part qqnorm(res$continuous, pch = "+", cex = 0.8) qqline(res$continuous, col = "dodgerblue", lwd = 2) # Discrete part (Pearson's standardized residuals do not have a normal distribution.) qqnorm(res$discrete, pch = "+", cex = 0.8) qqline(res$discrete, col = "dodgerblue", lwd = 2)
summary method for class "BCSreg".
## S3 method for class 'BCSreg' summary(object, ...) ## S3 method for class 'summary.BCSreg' print(x, digits = getOption("digits"), ...)## S3 method for class 'BCSreg' summary(object, ...) ## S3 method for class 'summary.BCSreg' print(x, digits = getOption("digits"), ...)
object |
an object of class |
... |
further arguments passed to or from other methods. |
x |
an object of class |
digits |
a non-null value for digits specifies the minimum number of significant digits to be printed in values. |
An object of class "summary.BCSreg" provides additional information from a Box-Cox
symmetric or zero-adjusted Box-Cox symmetric regression fit. In addition to summary
tables with estimates, standard errors, and individual significance tests of the
regression coefficients, it also provides the quantile residuals (Dunn and Smyth, 1996)
and goodness-of-fit measures.
The goodness-of-fit measures include:
A pseudo- defined based on Ferrari and Cribari-Neto (2004) as the
square of the sample correlation coefficient between and
, where is the response variable,
is the adjusted median, and is the link
function (link) used in the regression structure for the scale parameter.
By definition, , and perfect agreement between the fitted
values and the response yields .
A general goodness-of-fit measure based on Vanegas and Paula (2016), defined as
where is the th order statistic of the response,
is the expected value of the th order statistic from a standard normal sample
of size , denotes the cumulative distribution function
of the standard normal distribution, and denotes the fitted cumulative
distribution function of the assumed distribution for the response.
A weighting function that plays a role in the parameter estimation
process, providing information about the individual contribution of each
observation. It depends on the generating density function and is constant for
distributions generated by the normal family (family = "NO").
The function summary.BCSreg returns an object of class "summary.BCSreg",
which consists of a list with the following components:
the original function call, given in object.
summary statistics for the scale submodel.
summary statistics for the relative dispersion submodel.
summary statistics for the skewness parameter. If this parameter is not statistically different from zero, the fitted Box-Cox symmetric (BCS) distribution can be reduced to a more parsimonious log-symmetric distribution.
summary statistics for the zero-adjustment submodel when a zero-adjusted
model is considered; and NULL, otherwise.
the specified value for the extra parameter of the fitted BCS distribution, if applicable.
the generating family of the fitted BCS distribution.
a list with elements "mu" and "sigma" with the
specified link functions for the mu and sigma regression
structures, respectively. If the model is zero-adjusted, the element
"alpha" will also be returned with the link function for
the regression structure of the zero-adjustment parameter.
logical indicating successful convergence of the iterative process.
number of iterations reached until the optimization algorithm converges.
log-likelihood of the fitted model.
number of model parameters
a vector of quantile residuals.
pseudo R-squared value.
an overall goodness-of-fit measure.
a vector with the values for all the observations.
Akaike and Bayesian information criteria.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5, 236—244.
Ferrari, S., and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31, 799—815.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
Vanegas, L. H., and Paula, G. A. (2016). Log-symmetric distributions: statistical properties and parameter estimation. Brazilian Journal of Probability and Statistics, 30,196—220
# BCS regression for strictly positive response variables ## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit examples ### Fit a single Box-Cox normal regression model: fit_bcno1 <- BCSreg(cpue ~ location + tide_phase + max_temp, raycatch) summary(fit_bcno1) # Other quantities can be obtained from a summary.BCSreg object: aux <- summary(fit_bcno1) class(aux) str(aux) ### Fit a double Box-Cox normal regression model: fit_bcno2 <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) summary(fit_bcno2) ### Fit a double Box-Cox power exponential regression model (family = "PE"): fit_bcpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4) summary(fit_bcpe) ### Fit a double log-power exponential regression model (lambda = 0): fit_lpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4, lambda = 0) summary(fit_lpe) # Zero-adjusted BCS (ZABCS) regression for nonnegative response variables ## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Fit examples ### Fit a single zero-adjusted Box-Cox normal regression model: fit_zabcno1 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land, renewables2015) summary(fit_zabcno1) # Other quantities obtained from a zero-adjusted fit: aux <- summary(fit_zabcno1) str(aux) ### Fit a double zero-adjusted Box-Cox normal regression model: fit_zabcno2 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) summary(fit_zabcno2) ### Fit a triple zero-adjusted Box-Cox normal regression model: fit_zabcno3 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) summary(fit_zabcno3) ### Fit a triple zero-adjusted Box-Cox power exponential regression model (family = "PE"): fit_zabcpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4) summary(fit_zabcpe) ### Fit a triple zero-adjusted log-power exponential regression model (lambda = 0): fit_zalpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4, lambda = 0) summary(fit_zalpe) summary(fit_lpe)# BCS regression for strictly positive response variables ## Data set: raycatch (for description, run ?raycatch) hist(raycatch$cpue, xlab = "Catch per unit effort") plot(cpue ~ tide_phase, raycatch, pch = 16, xlab = "Tide phase", ylab = "Catch per unit effort") plot(cpue ~ location, raycatch, pch = 16, xlab = "Location", ylab = "Catch per unit effort") plot(cpue ~ max_temp, raycatch, pch = 16, xlab = "Maximum temperature", ylab = "Catch per unit effort") ## Fit examples ### Fit a single Box-Cox normal regression model: fit_bcno1 <- BCSreg(cpue ~ location + tide_phase + max_temp, raycatch) summary(fit_bcno1) # Other quantities can be obtained from a summary.BCSreg object: aux <- summary(fit_bcno1) class(aux) str(aux) ### Fit a double Box-Cox normal regression model: fit_bcno2 <- BCSreg(cpue ~ location + tide_phase | location + tide_phase + max_temp, raycatch) summary(fit_bcno2) ### Fit a double Box-Cox power exponential regression model (family = "PE"): fit_bcpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4) summary(fit_bcpe) ### Fit a double log-power exponential regression model (lambda = 0): fit_lpe <- BCSreg(cpue ~ location + tide_phase + max_temp | location + tide_phase + max_temp, raycatch, family = "PE", zeta = 4, lambda = 0) summary(fit_lpe) # Zero-adjusted BCS (ZABCS) regression for nonnegative response variables ## Data set: renewables2015 (for description, run ?renewables2015) plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF") abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3) text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue") plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16, xlab = "Education expenditure (percent of GNI)", ylab = "Renewable electricity output (in TWh)") plot(renew_elec_output ~ agri_land, renewables2015, pch = 16, xlab = "Matural logarithm of total agricultural land area", ylab = "Renewable electricity output (in TWh)") ## Fit examples ### Fit a single zero-adjusted Box-Cox normal regression model: fit_zabcno1 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land, renewables2015) summary(fit_zabcno1) # Other quantities obtained from a zero-adjusted fit: aux <- summary(fit_zabcno1) str(aux) ### Fit a double zero-adjusted Box-Cox normal regression model: fit_zabcno2 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) summary(fit_zabcno2) ### Fit a triple zero-adjusted Box-Cox normal regression model: fit_zabcno3 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015) summary(fit_zabcno3) ### Fit a triple zero-adjusted Box-Cox power exponential regression model (family = "PE"): fit_zabcpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4) summary(fit_zabcpe) ### Fit a triple zero-adjusted log-power exponential regression model (lambda = 0): fit_zalpe <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land | adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015, family = "PE", zeta = 4, lambda = 0) summary(fit_zalpe) summary(fit_lpe)
Density function, distribution function, quantile function, and random generation for the class of the zero-adjusted Box-Cox symmetric (ZABCS) distributions.
dZABCS(x, alpha, mu, sigma, lambda, zeta, family = "NO", log = FALSE) pZABCS( q, alpha, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) qZABCS( p, alpha, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) rZABCS(n, alpha, mu, sigma, lambda, zeta, family = "NO")dZABCS(x, alpha, mu, sigma, lambda, zeta, family = "NO", log = FALSE) pZABCS( q, alpha, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) qZABCS( p, alpha, mu, sigma, lambda, zeta, family = "NO", lower.tail = TRUE, log.p = FALSE ) rZABCS(n, alpha, mu, sigma, lambda, zeta, family = "NO")
x, q
|
vector of positive quantiles. |
alpha |
vector of zero-adjusted parameters, with values on (0, 1). |
mu |
vector of strictly positive scale parameters. |
sigma |
vector of strictly positive relative dispersion parameters. |
lambda |
vector of real-valued skewness parameters. If |
zeta |
strictly positive extra parameter. It must be specified with only one value in cases where the BCS distribution has an extra parameter. See “Details” below. |
family |
a character that specifies the symmetric generating family of the BCS distribution.
Available options are: |
log, log.p
|
logical; if |
lower.tail |
logical; if |
p |
vector of probabilities. |
n |
number of observations. If 'n' is a vector, its length is used as the number of required observations. |
The class of the ZABCS distributions was introduced by Medeiros and Queiroz (2025) as an extension of the Box-Cox symmetric (BCS) distributions (Ferrari and Fumes, 2017). The models consists of a broad class of probability distributions for positive continuous data which may include zeros.
Let be a positive continuous random variable with a ZABCS distribution
with parameters , , ,
and and density generating function .
The probability density function of
is given by
where
with
satisfies , and
.
The function is called density generating function, and it specifies the
generating symmetric family of within the class of the ZABCS probability
models. This function can also depend on extra parameters, such as the zero-adjusted
Box-Cox t and zero-adjusted Box-Cox power exponential distributions. We call
these extra parameters zeta. The currently available ZABCS distributions in the
BCSreg package are listed below:
| Distribution | Family abbreviation | N. of extra parameters |
| Zero-adjusted Box-Cox Hyperbolic | "HP" |
1 |
| Zero-adjusted Box-Cox Type I Logistic | "LOI" |
0 |
| Zero-adjusted Box-Cox Type II Logistic | "LOII" |
0 |
| Zero-adjusted Box-Cox Normal | "NO" |
0 |
| Zero-adjusted Box-Cox Power Exponential | "PE" |
1 |
| Zero-adjusted Box-Cox Sinh-Normal | "SN" |
1 |
| Zero-adjusted Box-Cox Slash | "SL" |
1 |
| Zero-adjusted Box-Cox t | "ST" |
1 |
dZABCS returns the density function, pZABCS gives the cumulative distribution function,
qZABCS provides the quantile function, and rZABCS generates random variables.
Francisco F. de Queiroz <[email protected]>
Rodrigo M. R. de Medeiros <[email protected]>
Ferrari, S. L. P., and Fumes, G. (2017). Box-Cox symmetric distributions and applications to nutritional data. AStA Advances in Statistical Analysis, 101, 321-344.
Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.
BCS to access the density function, distribution
function, quantile function, and a random number generator for the BCS
distributions. BCSreg for estimating the parameters of a
ZABCS regression model.
# Probability density function ## Right-skewed distributions curve(dZABCS(x, 0.4, 3, 0.3, -1.5, family = "NO"), from = 0.001, to = 7, xlim = c(0, 7), ylim = c(0, 0.5), ylab = "Density") curve(dZABCS(x, 0.4, 3, 0.3, -1.5, 4, family = "ST"), add = TRUE, col = 2, from = 0.001) curve(dZABCS(x, 0.4, 3, 0.3, -1.5, 5, family = "PE"), add = TRUE, col = 4, from = 0.001) points(0, 0.4, type = "h", lty = 2) points(0, 0.4, pch = 16, lty = 2) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Truncated symmetric distributions (with support on (0, Inf)) curve(dZABCS(x, 0.4, 3, 0.3, 1, family = "NO"), from = 0.001, to = 7, xlim = c(0, 7), ylim = c(0, 0.5), ylab = "Density") curve(dZABCS(x, 0.4, 3, 0.3, 1, 4, family = "ST"), add = TRUE, col = 2, from = 0.001) curve(dZABCS(x, 0.4, 3, 0.3, 1, 5, family = "PE"), add = TRUE, col = 4, from = 0.001) points(0, 0.4, type = "h", lty = 2) points(0, 0.4, pch = 16, lty = 2) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Left-skewed distributions curve(dZABCS(x, 0.4, 3, 0.3, 3, family = "NO"), from = 0.001, to = 7, xlim = c(0, 7), ylim = c(0, 0.5), ylab = "Density") curve(dZABCS(x, 0.4, 3, 0.3, 3, 4, family = "ST"), add = TRUE, col = 2, from = 0.001) curve(dZABCS(x, 0.4, 3, 0.3, 3, 5, family = "PE"), add = TRUE, col = 4, from = 0.001) points(0, 0.4, type = "h", lty = 2) points(0, 0.4, pch = 16, lty = 2) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) # Random generation ## Parameter setting alpha <- 0.2 # zero-adjustment parameter mu <- 5 # scale parameter sigma <- 0.2 # relative dispersion parameter lambda <- -0.2 # skewness parameter ## Generating family family <- "NO" ## Visualization x <- rZABCS(10000, alpha, mu, sigma, lambda, family = family) hist(x, prob = TRUE, col = "white", main = "") points(0, mean(x == 0), type = "h", lty = 2) points(0, mean(x == 0), pch = 16, lty = 2) curve(dZABCS(x, alpha, mu, sigma, lambda, zeta, family = family), col = "blue", add = TRUE) plot(ecdf(x), main = "") curve(pZABCS(x, alpha, mu, sigma, lambda, zeta, family = family), col = "blue", add = TRUE)# Probability density function ## Right-skewed distributions curve(dZABCS(x, 0.4, 3, 0.3, -1.5, family = "NO"), from = 0.001, to = 7, xlim = c(0, 7), ylim = c(0, 0.5), ylab = "Density") curve(dZABCS(x, 0.4, 3, 0.3, -1.5, 4, family = "ST"), add = TRUE, col = 2, from = 0.001) curve(dZABCS(x, 0.4, 3, 0.3, -1.5, 5, family = "PE"), add = TRUE, col = 4, from = 0.001) points(0, 0.4, type = "h", lty = 2) points(0, 0.4, pch = 16, lty = 2) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Truncated symmetric distributions (with support on (0, Inf)) curve(dZABCS(x, 0.4, 3, 0.3, 1, family = "NO"), from = 0.001, to = 7, xlim = c(0, 7), ylim = c(0, 0.5), ylab = "Density") curve(dZABCS(x, 0.4, 3, 0.3, 1, 4, family = "ST"), add = TRUE, col = 2, from = 0.001) curve(dZABCS(x, 0.4, 3, 0.3, 1, 5, family = "PE"), add = TRUE, col = 4, from = 0.001) points(0, 0.4, type = "h", lty = 2) points(0, 0.4, pch = 16, lty = 2) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) ## Left-skewed distributions curve(dZABCS(x, 0.4, 3, 0.3, 3, family = "NO"), from = 0.001, to = 7, xlim = c(0, 7), ylim = c(0, 0.5), ylab = "Density") curve(dZABCS(x, 0.4, 3, 0.3, 3, 4, family = "ST"), add = TRUE, col = 2, from = 0.001) curve(dZABCS(x, 0.4, 3, 0.3, 3, 5, family = "PE"), add = TRUE, col = 4, from = 0.001) points(0, 0.4, type = "h", lty = 2) points(0, 0.4, pch = 16, lty = 2) legend("topright", legend = c("BCNO", "BCT", "BCPE"), lty = 1, col = c(1, 2, 4)) # Random generation ## Parameter setting alpha <- 0.2 # zero-adjustment parameter mu <- 5 # scale parameter sigma <- 0.2 # relative dispersion parameter lambda <- -0.2 # skewness parameter ## Generating family family <- "NO" ## Visualization x <- rZABCS(10000, alpha, mu, sigma, lambda, family = family) hist(x, prob = TRUE, col = "white", main = "") points(0, mean(x == 0), type = "h", lty = 2) points(0, mean(x == 0), pch = 16, lty = 2) curve(dZABCS(x, alpha, mu, sigma, lambda, zeta, family = family), col = "blue", add = TRUE) plot(ecdf(x), main = "") curve(pZABCS(x, alpha, mu, sigma, lambda, zeta, family = family), col = "blue", add = TRUE)