Title: | Power Logit Regression for Modeling Bounded Data |
---|---|
Description: | Power logit regression models for bounded continuous data, in which the density generator may be normal, Student-t, power exponential, slash, hyperbolic, sinh-normal, or type II logistic. Diagnostic tools associated with the fitted model, such as the residuals, local influence measures, leverage measures, and goodness-of-fit statistics, are implemented. The estimation process follows the maximum likelihood approach and, currently, the package supports two types of estimators: the usual maximum likelihood estimator and the penalized maximum likelihood estimator. More details about power logit regression models are described in Queiroz and Ferrari (2022) <arXiv:2202.01697>. |
Authors: | Felipe Queiroz [aut, cre], Silvia Ferrari [aut] |
Maintainer: | Felipe Queiroz <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4.0 |
Built: | 2025-02-08 03:06:47 UTC |
Source: | https://github.com/ffqueiroz/plreg |
A dataset containing the percent body fat of little brown bats sampled in Aeolus Cave, located in East Dorset, Vermont, in the North Eastern United States.
data(bodyfat_Aeolus)
data(bodyfat_Aeolus)
A data frame with 159 rows and 4 variables:
sex of the sampled bat, M
for masculine and F
for
feminine.
percent body fat.
year the bat was sampled.
hibernation time, defined as days since the fall equinox.
The complete dataset was collected by Cheng et al. (2019) in five
different regions of the United States. bodyfat_Aeolus
is the subset
of the data collected in Aeolus Cave, located in East Dorset, Vermont,
in the North Eastern United States.
https://datadryad.org/stash/dataset/doi:10.5061/dryad.sh487nh
Cheng TL, Gerson A, Moore MS, et al. (2019) Higher fat stores contribute to persistence of little brown bat populations with white-nose syndrome. J Anim Ecol. 88:591-600. https://doi.org/10.1111/1365-2656.12954
data("bodyfat_Aeolus") # Model with zeta = 2 fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) summary(fit)
data("bodyfat_Aeolus") # Model with zeta = 2 fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) summary(fit)
The CI.lambda
function provides a plot of the profile (penalized)
likelihood ratio statistics for lambda, useful to obtain confidence
intervals for the skewness parameter (see Queiroz and Ferrari (2022)).
CI.lambda(object, conf.coef = 0.95, lower = NULL, upper = NULL)
CI.lambda(object, conf.coef = 0.95, lower = NULL, upper = NULL)
object |
fitted model object of class " |
conf.coef |
confidence level of the confidence interval. Default is 0.95. |
lower |
a numeric value representing the lower limit of the interval for
the skewness parameter. If |
upper |
a numeric value representing the upper limit of the interval for
the skewness parameter. If |
The function returns a plot of the profile penalized likelihood ratio statistics for lambda with a horizontal dashed line, indicating the confidence interval for lambda. It also shows the confidence interval obtained.
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5) CI.lambda(fitPL)
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5) CI.lambda(fitPL)
envelope
is used to display normal probability plots with simulated
envelope of residuals for the power logit models. Currently, three types of
residuals are supported: quantile, deviance and standardized residuals.
envelope( object, type = c("quantile", "deviance", "standardized"), rep = 40, conf = 0.95, xlab, ylab, main, envcol, ylim, xlim )
envelope( object, type = c("quantile", "deviance", "standardized"), rep = 40, conf = 0.95, xlab, ylab, main, envcol, ylim, xlim )
object |
fitted model object of class " |
type |
character specifying the type of residuals to be used,
see |
rep |
a positive integer representing the number of iterations to calculate
the simulated envelopes. Default is |
conf |
a numeric value in the interval (0,1) that represents the confidence
level of the simulated envelopes. Default is |
xlab |
character specifying the label for |
ylab |
character specifying the label for |
main |
character specifying the overall title for the plot. |
envcol |
character specifying the color of the envelope. |
ylim , xlim
|
numeric values, specifying the left/lower limit and the right/upper limit of the scale. |
The envelope
uses the idea of Atkinson (1985) to create normal
probability plots with simulated envelope. Under the correct model,
approximately 100*conf
of the residuals are expected to be inside
the envelope.
envelope
returns normal probability plot with simulated envelopes
for the residuals.
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression
for modeling bounded data. arXiv:2202.01697.
Atkinson, A. C. (1985) Plots, transformations and regression: an introduction
to graphical methods of diagnostic regression analysis.
Oxford Science Publications, Oxford.
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) summary(fitPL) envelope(fitPL, type = "standardized") envelope(fitPL, type = "quantile") envelope(fitPL, type = "deviance")
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) summary(fitPL) envelope(fitPL, type = "standardized") envelope(fitPL, type = "quantile") envelope(fitPL, type = "deviance")
The extra.parameter
function is used to select the extra parameter
of some power logit models. It provides plots of -2logLik
and the
Upsilon measure (see Queiroz and Ferrari (2022)) versus ,
the extra parameter.
extra.parameter(object, lower, upper, grid = 10, graph = TRUE)
extra.parameter(object, lower, upper, grid = 10, graph = TRUE)
object |
fitted model object of class " |
lower |
a numeric value representing the lower limit of the interval for the extra parameter. |
upper |
a numeric value representing the upper limit of the interval for the extra parameter. |
grid |
a positive integer representing the number of points in the plots.
Default is |
graph |
logical. If |
extra.parameter
returns a list with five objects:
zeta.Ups |
The selected zeta based on the Upsilon measure. |
zeta.loglik |
The selected zeta based on -2 |
zeta.values |
The values of zeta used in the graphs. |
Upsilon.values |
-2 |
loglik.values |
Upsilon measure evaluated at each value of zeta. |
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.
data("bodyfat_Aeolus") #Initial model with zeta = 2 fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) summary(fit1) # Choosing the best value for zeta extra.parameter(fit1, lower = 1, upper = 4, grid = 15)
data("bodyfat_Aeolus") #Initial model with zeta = 2 fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) summary(fit1) # Choosing the best value for zeta extra.parameter(fit1, lower = 1, upper = 4, grid = 15)
A dataset on risk management practices of 73 firms.
data(Firm)
data(Firm)
A data frame with 73 rows and 7 variables:
total property and casualty premiums and uninsured losses as a percentage of total assets.
per occurrence retention amount as a percentage of total assets.
indicates that the firm owns a captive insurance company; 1 if the firm uses a captive, 0 otherwise.
logarithm of total assets.
a measure of the firm's industry risk.
a measure of the importance of the local managers in choosing the amount of risk to be retained.
a measure of the degree of importance in using analytical tools.
The dataset was introduced and analyzed by Schmit and Roth (1990) and
is available in the personal web page of Professor E. Frees (Wisconsin School
of Business Research). The response variable is FIRMCOST
, smaller
values of firm cost are attributed to firms that have a good risk management
performance.
Schmit, J.T. and Roth, K. (1990). Cost effectiveness of risk management practices. Journal of Risk and Insurance. 57:455-470.
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) summary(fitPL) plot(fitPL, type = "standardized")
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) summary(fitPL) plot(fitPL, type = "standardized")
The influence
function provides two influence measures and the generalized
leverage for power logit regression models.
influence(model, graph = TRUE, ...)
influence(model, graph = TRUE, ...)
model |
fitted model object of class " |
graph |
logical. If |
... |
currently not used. |
influence
returns a list with three objects:
case.weights |
The values of |
totalLI |
The total local influence (see Lesaffre and Verbeke (1998)) |
GL |
The diagonal elements of the generalized leverage matrix. |
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.
PLreg
, residuals.PLreg
, envelope
,
plot.PLreg
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) influence_measures = influence(fitPL, graph = FALSE) plot(influence_measures$case.weights, type = "h", ylim = c(0,1)) plot(influence_measures$totalLI, type = "h", ylim = c(0,6)) plot(Firm$sizelog, influence_measures$GL, pch = "+")
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) influence_measures = influence(fitPL, graph = FALSE) plot(influence_measures$case.weights, type = "h", ylim = c(0,1)) plot(influence_measures$totalLI, type = "h", ylim = c(0,6)) plot(Firm$sizelog, influence_measures$GL, pch = "+")
Some S3 Methods for PLreg regression models.
## S3 method for class 'PLreg' summary(object, type = "standardized", ...) ## S3 method for class 'PLreg' print(x, ...) ## S3 method for class 'summary.PLreg' print(x, ...) ## S3 method for class 'PLreg' coef(object, ...) ## S3 method for class 'PLreg' vcov(object, ...) ## S3 method for class 'PLreg' logLik(object, ...) ## S3 method for class 'PLreg' model.matrix(object, model = c("median", "dispersion"), ...)
## S3 method for class 'PLreg' summary(object, type = "standardized", ...) ## S3 method for class 'PLreg' print(x, ...) ## S3 method for class 'summary.PLreg' print(x, ...) ## S3 method for class 'PLreg' coef(object, ...) ## S3 method for class 'PLreg' vcov(object, ...) ## S3 method for class 'PLreg' logLik(object, ...) ## S3 method for class 'PLreg' model.matrix(object, model = c("median", "dispersion"), ...)
object , x
|
fitted model object of class " |
type |
character specifying the type of residuals to be included in the
summary output, see |
... |
currently not used. |
model |
character specifying for which component of the model the coefficients/covariance are extracted. |
A set of methods for objects of class "PLreg
", including methods
for the functions summary
and vcov
,
which print the estimated coefficients along with some other information and
presents the covariance matrix, respectively. The summary
also presents
the partial Wald tests for the model parameters. Finally, summary
returns
an object of class "summary.PLreg
" containing information to be printed
using the print
method.
data("bodyfat_Aeolus") fitPL <- PLreg(percentfat ~ 1, data = bodyfat_Aeolus, family = "SN", zeta = 1.6) fitPL summary(fitPL) coef(fitPL, model = "median") vcov(fitPL) logLik(fitPL) AIC(fitPL)
data("bodyfat_Aeolus") fitPL <- PLreg(percentfat ~ 1, data = bodyfat_Aeolus, family = "SN", zeta = 1.6) fitPL summary(fitPL) coef(fitPL, model = "median") vcov(fitPL) logLik(fitPL) AIC(fitPL)
A dataset on the blank votes in the 2006 Peruvian general election.
data(PeruVotes)
data(PeruVotes)
A data frame with 194 rows and 2 variables:
proportion of blank votes in the 2006 Peruvian general election.
Human Development Index.
The dataset was collected by Bayes et al. (2012) and the response variable
is votes
, proportion of blank votes in the 2006 Peruvian general election.
Bayes et al. (2012) analyzed the influence of the Human Development Index (HDI) on
the proportion of blank votes using a beta rectangular regression model. Lemonte and
Bazan (2015) also analyzed the data using GJS regression models.
Bayes, C., Bazan, J. L. and Garcia, C. (2012). A new robust regression model for proportions.
Bayesian Analysis. 7:771-796
Lemonte, A. J. and Bazan, J. L. (2015). New class of Johnson SB distributions
and its associated regression model for rates and proportions. Biometrical Journal. 58:727-746.
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5, control = PLreg.control(lambda = 1)) summary(fitPL) plot(fitPL, type = "standardized")
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5, control = PLreg.control(lambda = 1)) summary(fitPL) plot(fitPL, type = "standardized")
Density, distribution function, quantile function and random generation for power logit distributions.
dPL(x, mu, sigma, lambda, zeta = 2, family, log = FALSE) pPL(q, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE) qPL(p, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE) rPL(n, mu, sigma, lambda, zeta = 2, family)
dPL(x, mu, sigma, lambda, zeta = 2, family, log = FALSE) pPL(q, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE) qPL(p, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE) rPL(n, mu, sigma, lambda, zeta = 2, family)
x , q
|
vector of quantiles. |
mu |
vector of medians. |
sigma |
vector of dispersion parameters. |
lambda |
vector of skewness parameters. |
zeta |
vector of extra parameters. |
family |
string that specifies the family used to define the power logit distribution. The family is
|
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). Default is FALSE. |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
vector of probabilities. |
n |
number of observations. If |
If zeta
is not specified, it assumes the default value 2.
The power logit distribution has density
for , in which
,
is the density generator
and
,
and
are the median, dispersion and skewness of the distribution.
It is possible to consider to obtain the limiting case, the log-log distribution. This distribution has
density
for ,
in which
.
The family
argument defines the density generator , which may depend on an extra parameter (
zeta
).
dPL
gives the density, pPL
gives the distribution function, qPL
gives the quantile function,
and rPL
generates random variables.
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv preprint arXiv:2202.01697.
dPL(0.2, mu = 0.3, sigma = 1, lambda=1, zeta = 2, family = "PE") mu = 0.3; sigma = 1; lambda = 2 set.seed(1) PLsample = rPL(1000, mu, sigma, lambda, family = "SN", zeta = 2.5) hist(PLsample, prob = TRUE, breaks = 15, main = "", las = 1) curve(dPL(x, mu, sigma, lambda, family = "SN", zeta = 2.5), from = 0.01, to = 0.8, add = TRUE, col = "red") rug(PLsample) x = seq(0.01, 0.9,0.01) y = dPL(x, mu, sigma, lambda, family = "Hyp", zeta = 2) plot(x, y, type = "l", lwd = 2, las = 1) x1 = seq(0.01, 0.4, 0.01) y1 = dPL(x1, mu, sigma, lambda, family = "Hyp", zeta = 2) polygon(c(x1, 0.4, 0), c(y1, 0, 0), col = "lightblue") text(mu-0.025, 1, paste("P(Y<0.4) = ", round(pPL(0.4, mu, sigma, lambda, family = "Hyp", zeta = 2),2)), font = 1, cex = 0.8) plot(x, pPL(x, mu, sigma, lambda, family = "PE", zeta = 1.3), type = "l", las = 1, lwd = 2, ylab = expression(P(Y<y)), xlab = "y") p = pPL(0.5, mu, sigma, lambda, family = "PE", zeta = 1.3) q = qPL(p, mu, sigma, lambda, family = "PE", zeta = 1.3) points(q, p, pch = 16, col = 2, cex = 1.5) text(0.55, 0.83, paste("(", 0.5, ",", round(p, 2), ")"), font = 2, cex = 0.8, col = "red")
dPL(0.2, mu = 0.3, sigma = 1, lambda=1, zeta = 2, family = "PE") mu = 0.3; sigma = 1; lambda = 2 set.seed(1) PLsample = rPL(1000, mu, sigma, lambda, family = "SN", zeta = 2.5) hist(PLsample, prob = TRUE, breaks = 15, main = "", las = 1) curve(dPL(x, mu, sigma, lambda, family = "SN", zeta = 2.5), from = 0.01, to = 0.8, add = TRUE, col = "red") rug(PLsample) x = seq(0.01, 0.9,0.01) y = dPL(x, mu, sigma, lambda, family = "Hyp", zeta = 2) plot(x, y, type = "l", lwd = 2, las = 1) x1 = seq(0.01, 0.4, 0.01) y1 = dPL(x1, mu, sigma, lambda, family = "Hyp", zeta = 2) polygon(c(x1, 0.4, 0), c(y1, 0, 0), col = "lightblue") text(mu-0.025, 1, paste("P(Y<0.4) = ", round(pPL(0.4, mu, sigma, lambda, family = "Hyp", zeta = 2),2)), font = 1, cex = 0.8) plot(x, pPL(x, mu, sigma, lambda, family = "PE", zeta = 1.3), type = "l", las = 1, lwd = 2, ylab = expression(P(Y<y)), xlab = "y") p = pPL(0.5, mu, sigma, lambda, family = "PE", zeta = 1.3) q = qPL(p, mu, sigma, lambda, family = "PE", zeta = 1.3) points(q, p, pch = 16, col = 2, cex = 1.5) text(0.55, 0.83, paste("(", 0.5, ",", round(p, 2), ")"), font = 2, cex = 0.8, col = "red")
This function provides plots for diagnostic analysis of power logit regression models.
## S3 method for class 'PLreg' plot(x, which = 1:4, type = "standardized", pch = "+", las = 1, cex = 0.8, ...)
## S3 method for class 'PLreg' plot(x, which = 1:4, type = "standardized", pch = "+", las = 1, cex = 0.8, ...)
x |
fitted model object of class " |
which |
numeric specifying a subset of plots (numbers between 1 and 7).
Default is |
type |
character specifying the type of residuals,
see |
pch , las , cex , ...
|
graphical parameters (see |
The plot
method for PLreg
objects provides 7 types
of diagnostic plots in the following order.
An index plot of the residuals versus indexes of observations.
An index plot of local influence based on the case-weight perturbation scheme.
A dispersion diagram of the generalized leverage versus the predicted values.
A dispersion diagram of the residuals versus the linear predictors.
A normal probability plot of the residuals.
A dispersion diagram of the predicted values versus the observed values.
A dispersion diagram of the function
versus the residuals. For some power logit models, the
function
may be interpreted as weights in the estimation process. If
family = "NO"
,
the function is constant.
The which
argument can be used to select a subset of the implemented plots.
Default is which = 1:4
.
plot
method for PLreg
objects returns 7 types
of diagnostic plots.
PLreg
, residuals.PLreg
, envelope
, influence
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) par(mfrow = c(3,3)) plot(fitPL, type = "standardized") par(mfrow = c(1, 1))
data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) par(mfrow = c(3,3)) plot(fitPL, type = "standardized") par(mfrow = c(1, 1))
PLreg
is used to fit power logit regression model for continuous and bounded variables via maximum likelihood approach.
Both median and dispersion of the response variable are modeled through
parametric functions.
PLreg( formula, data, subset, na.action, family = c("NO", "LO", "TF", "PE", "SN", "SLASH", "Hyp"), zeta = NULL, link = c("logit", "probit", "cloglog", "cauchit", "loglog"), link.sigma = NULL, type = c("pML", "ML"), control = PLreg.control(...), model = TRUE, y = TRUE, x = FALSE, ... ) PLreg.fit( X, y, S = NULL, family, type = "pML", zeta = zeta, link = "logit", link.sigma = "log", control = PLreg.control() )
PLreg( formula, data, subset, na.action, family = c("NO", "LO", "TF", "PE", "SN", "SLASH", "Hyp"), zeta = NULL, link = c("logit", "probit", "cloglog", "cauchit", "loglog"), link.sigma = NULL, type = c("pML", "ML"), control = PLreg.control(...), model = TRUE, y = TRUE, x = FALSE, ... ) PLreg.fit( X, y, S = NULL, family, type = "pML", zeta = zeta, link = "logit", link.sigma = "log", control = PLreg.control() )
formula |
a symbolic description of the model. See details for further information. |
data , subset , na.action
|
arguments controlling formula processing via |
family |
a description of the symmetric distribution to be used for generating the power logit model.
Supported families include " |
zeta |
a numeric value or numeric vector that represents the extra parameter of the distribution. For the PL-NO and PL-LO models, no extra parameter is needed. |
link |
an optional character that specifies the link function of the median submodel (mu).
The " |
link.sigma |
an optional character that specifies the link function of the dispersion submodel (sigma).
The " |
type |
character specifying the type of estimator for the skewness parameter.
Currently, penalized maximum likelihood (" |
control |
a list of control arguments specified via |
model , y , x
|
logicals. If |
... |
arguments passed to |
X |
numeric regressor matrix for the median submodel. |
S |
numeric regressor matrix for the dispersion submodel. |
The power logit regression models, proposed by Queiroz and Ferrari (2022), is useful in
situations when the response variable is continuous and bounded on the unit interval (0, 1).
The median and the dispersion parameters are modeled through parametric link
functions. The models depend on a skewness parameter (called ). When the skewness parameter is fixed
and equal to 1, the power logit models coincide with the GJS regression models
(Lemonte and Bazan, 2016). Queiroz and Ferrari (2022) suggest using a penalized maximum
likelihood method to estimate the parameters. This method is implemented in
PLreg
by default when is not fixed. If convergence is not reached,
maximum likelihood estimation is performed. The estimation
process uses
optim
. If no starting values are specified,
the PLreg
function uses those suggested by Queiroz and Ferrari (2022).
This function also fits the log-log regression models by setting
at zero (
represents
).
The formulation of the model has the same structure as in the usual functions
lm
and glm
. The argument
formula
could comprise of three parts (separated by the symbols "" and "
"),
namely: observed response variable in the unit interval, predictor of the median submodel,
with link function
link
and predictor of the dispersion submodel, with link.sigma
link function. If the model has constant dispersion, the third part may be omitted and the link function for sigma
is "log
" by default. The skewness parameter lambda
may be
treated as fixed or not (default). If lambda
is fixed, its value
must be specified in the control
argument.
Some methods are available for objects of class "PLreg
",
see plot.PLreg
, summary.PLreg
,
coef.PLreg
, vcov.PLreg
, and
residuals.PLreg
, for details and other methods.
PLreg
returns an object of class "PLreg
" with the following
components (the PLreg.fit
returns elements up to v
).
coefficients |
a list with the " |
residuals |
a vector of the raw residuals (the difference between the observed and the fitted response). |
fitted.values |
a vector with the fitted values of the median submodel. |
optim |
a list with the output from |
family |
a character specifying the |
method |
the method argument passed to the optim call. |
control |
the control arguments passed to the optim call. |
start |
a vector with the starting values used in the iterative process. |
nobs |
number of observations. |
df.null |
residual degrees of freedom in the null model
(constant median and dispersion), i.e., |
df.residual |
residual degrees of freedom in the fitted model. |
lambda |
value of the skewness parameter lambda
( |
loglik |
log-likelihood of the fitted model. |
loglikp |
penalized profile log-likelihood for lambda. If lambda is
equal to zero, |
vcov |
covariance matrix of all the parameters. |
pseudo.r.squared |
pseudo R-squared value. |
Upsilon.zeta |
an overall goodness-of-fit measure. |
link |
a list with elements " |
converged |
logical indicating successful convergence of the iterative process. |
zeta |
a numeric specifying the value of zeta used in the estimation process. |
type |
a character specifying the estimation method used. |
v |
a vector with the v(z) values for all the observations (see Queiroz and Ferrari(2021)). |
call |
the original function call. |
formula |
the formula used. |
terms |
a list with elements " |
levels |
a list with elements " |
contrasts |
a list with elements " |
model |
the full model frame (if |
y |
the response variable (if |
x |
a list with elements " |
Francisco Felipe de Queiroz ([email protected]) and Silvia L. P. Ferrari.
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression
for modeling bounded data. arXiv:2202.01697.
Lemonte, A. J. and Bazan, J. L. (2015). New class of Johnson SB distributions
and its associated regression model for rates and proportions. Biometrical Journal. 58:727-746.
summary.PLreg
, PLreg.control
, residuals.PLreg
#### Body fat data data("bodyfat_Aeolus") #Initial model with zeta = 2 fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) summary(fit1) # Choosing the best value for zeta # extra.parameter(fit1, lower = 1, upper = 4, grid = 15) # Using zeta = 1.7 fit2 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 1.7) summary(fit2) # Fixing lambda = 1 fit3 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 1.7, control = PLreg.control(lambda = 1)) summary(fit3) # Comparing the AIC and Upsilon values between fit2 and fit3 AIC(fit2) < AIC(fit3) # TRUE fit2$Upsilon.zeta < fit3$Upsilon.zeta #TRUE #### Firm cost data data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) summary(fitPL) #extra.parameter(fitPL, lower = 1.2, upper = 4, grid = 10) #plot(fitPL, type = "standardized") #envelope(fitPL, type = "standardized") fitPL_wo72 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm[-72,], family = "SLASH", zeta = 2.13) fitPL_wo15 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm[-15,], family = "SLASH", zeta = 2.13) fitPL_wo16 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm[-16,], family = "SLASH", zeta = 2.13) coef.mu <- coef(fitPL)[1:3] coef.mu_wo72 <- coef(fitPL_wo72)[1:3] coef.mu_wo15 <- coef(fitPL_wo15)[1:3] coef.mu_wo16 <- coef(fitPL_wo16)[1:3] plot(Firm$indcost, Firm$firmcost, pch = "+", xlab = "indcost", ylab = "firmcost") #identify(Firm$indcost, Firm$firmcost) covariate = matrix(c(rep.int(1, 1000), rep(median(Firm$sizelog), 1000), seq(0, 1.22, length.out = 1000)), ncol = 3) lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu)), type = "l") lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo72)), type = "l", lty = 2, col = "blue") lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo15)), type = "l", lty = 3, col = "red") lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo16)), type = "l", lty = 4, col = "green") parameters = c("pML", "pML w/o 72", "pML w/o 15", "pML w/o 16") legend(x = 0.5, y = 0.8, legend = parameters, col = c("black", "blue", "red", "green"), lty = c(1, 2, 3, 4), cex = 0.6)
#### Body fat data data("bodyfat_Aeolus") #Initial model with zeta = 2 fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) summary(fit1) # Choosing the best value for zeta # extra.parameter(fit1, lower = 1, upper = 4, grid = 15) # Using zeta = 1.7 fit2 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 1.7) summary(fit2) # Fixing lambda = 1 fit3 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 1.7, control = PLreg.control(lambda = 1)) summary(fit3) # Comparing the AIC and Upsilon values between fit2 and fit3 AIC(fit2) < AIC(fit3) # TRUE fit2$Upsilon.zeta < fit3$Upsilon.zeta #TRUE #### Firm cost data data("Firm") fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm, family = "SLASH", zeta = 2.13) summary(fitPL) #extra.parameter(fitPL, lower = 1.2, upper = 4, grid = 10) #plot(fitPL, type = "standardized") #envelope(fitPL, type = "standardized") fitPL_wo72 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm[-72,], family = "SLASH", zeta = 2.13) fitPL_wo15 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm[-15,], family = "SLASH", zeta = 2.13) fitPL_wo16 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm[-16,], family = "SLASH", zeta = 2.13) coef.mu <- coef(fitPL)[1:3] coef.mu_wo72 <- coef(fitPL_wo72)[1:3] coef.mu_wo15 <- coef(fitPL_wo15)[1:3] coef.mu_wo16 <- coef(fitPL_wo16)[1:3] plot(Firm$indcost, Firm$firmcost, pch = "+", xlab = "indcost", ylab = "firmcost") #identify(Firm$indcost, Firm$firmcost) covariate = matrix(c(rep.int(1, 1000), rep(median(Firm$sizelog), 1000), seq(0, 1.22, length.out = 1000)), ncol = 3) lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu)), type = "l") lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo72)), type = "l", lty = 2, col = "blue") lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo15)), type = "l", lty = 3, col = "red") lines(covariate[,3], as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo16)), type = "l", lty = 4, col = "green") parameters = c("pML", "pML w/o 72", "pML w/o 15", "pML w/o 16") legend(x = 0.5, y = 0.8, legend = parameters, col = c("black", "blue", "red", "green"), lty = c(1, 2, 3, 4), cex = 0.6)
Parameters that control fitting of power logit regression models using PLreg
.
PLreg.control( lambda = NULL, method = "BFGS", maxit = 2000, trace = FALSE, start = NULL, ... )
PLreg.control( lambda = NULL, method = "BFGS", maxit = 2000, trace = FALSE, start = NULL, ... )
lambda |
numeric indicating the value of the skewness parameter lambda (if |
method |
character specifying the |
maxit , trace , ...
|
arguments passed to |
start |
an optional vector with starting values for median and dispersion submodels (starting value for lambda must not be included). |
The PLreg.control
controls the fitting process of power logit models. Almost all the arguments
are passed on directly to optim
, which is used to estimate the parameters.
Starting values for median and dispersion submodels may be supplied via start
. If the
estimation process is to be performed with a fixed skewness parameter, a value must be specified
in lambda
. If lambda = 0
, a log-log regression model
will be estimated.
A list with components named as the arguments.
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5, control = PLreg.control(lambda = 1)) summary(fitPL)
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5, control = PLreg.control(lambda = 1)) summary(fitPL)
The function provides three types of residuals for power logit regression models: quantile, deviance and standardized.
## S3 method for class 'PLreg' residuals(object, type = c("quantile", "deviance", "standardized"), ...)
## S3 method for class 'PLreg' residuals(object, type = c("quantile", "deviance", "standardized"), ...)
object |
fitted model object of class " |
type |
character specifying the type of residuals to be used. |
... |
currently not used. |
The quantile residuals is based on Dunn and Smyth (1996) idea. The
residuals are well-defined for all the distributions in the power logit class and
have, approximately, a standard normal distribution in large samples if the model is
correctly specified.
The deviance residuals are based on the log-likelihood contributions
of each observation in the sample. The distribution of this residual is unknown
(both exact and asymptotic), except for the power logit normal model, which is,
approximately, standard normal.
The standardized residuals are a standardized form of the ordinary
residual. These residuals take into account the diagonal elements of the
matrix, being useful for detecting leverage observations. The distribution
of the standardized residuals is unknown.
residuals
method for object of class "PLreg
" returns a vector
with the residuals of the type specified in the type
argument.
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression
for modeling bounded data. arXiv:2202.01697.
Dunn, P. K. and Smyth, G. K. (1996) Randomized quantile residuals.
Journal of Computational and Graphical Statistics, 5:236-244.
PLreg
, plot.PLreg
, envelope
, influence
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5) res_quantile = residuals(fitPL, type = "quantile") res_standardized = residuals(fitPL, type = "standardized") plot(res_standardized, pch = "+", ylim = c(-6, 6)) abline(h = -3, lty = 2) abline(h = 3, lty = 2) qqnorm(res_quantile) qqline(res_quantile)
data("PeruVotes") fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes, family = "TF", zeta = 5) res_quantile = residuals(fitPL, type = "quantile") res_standardized = residuals(fitPL, type = "standardized") plot(res_standardized, pch = "+", ylim = c(-6, 6)) abline(h = -3, lty = 2) abline(h = 3, lty = 2) qqnorm(res_quantile) qqline(res_quantile)
The sandwich
function provides an estimate for the asymptotic variance
and covariance matrix of the parameter estimators of the power logit (or log-log)
regression models based on de sandwich estimator (see Queiroz and Ferrari (2022)).
sandwich(object)
sandwich(object)
object |
fitted model object of class " |
extra.parameter
returns a matrix containing the sandwich variance and
covariance matrix estimate.
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.
data("Firm") fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) sandwich(fit)
data("Firm") fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus, family = "PE", zeta = 2) sandwich(fit)