Package 'PLreg'

Title:	Power Logit Regression for Modeling Bounded Data
Description:	Power logit regression models for bounded continuous data, in which the density generator may be normal, Student-t, power exponential, slash, hyperbolic, sinh-normal, or type II logistic. Diagnostic tools associated with the fitted model, such as the residuals, local influence measures, leverage measures, and goodness-of-fit statistics, are implemented. The estimation process follows the maximum likelihood approach and, currently, the package supports two types of estimators: the usual maximum likelihood estimator and the penalized maximum likelihood estimator. More details about power logit regression models are described in Queiroz and Ferrari (2022) <arXiv:2202.01697>.
Authors:	Felipe Queiroz [aut, cre], Silvia Ferrari [aut]
Maintainer:	Felipe Queiroz <[email protected]>
License:	GPL (>= 3)
Version:	0.4.0
Built:	2025-03-10 05:35:40 UTC
Source:	https://github.com/ffqueiroz/plreg

Help Index

Body Fat of Little Brown Bat
Confidence Interval for the Skewness Parameter
Normal Probability Plots with Simulated Envelope of Residuals for PLreg Objects
Procedure to Select the Extra Parameter for PLreg Objects
Firm Cost
Influence Diagnostics for PLreg Objects
Methods for PLreg Objects
Peru Blank Votes
Power Logit Distributions
Diagnostic Plots for PLreg Objects
Power Logit Regression Models for Bounded Variables
Auxiliary for Controlling PL Fitting
Residuals Method for PLreg Objects
Sandwich Variance and Covariance Matrix for PLreg Objects

Body Fat of Little Brown Bat

Description

A dataset containing the percent body fat of little brown bats sampled in Aeolus Cave, located in East Dorset, Vermont, in the North Eastern United States.

Usage

data(bodyfat_Aeolus)
data(bodyfat_Aeolus)

Format

A data frame with 159 rows and 4 variables:

sex: sex of the sampled bat, M for masculine and F for feminine.
percentfat: percent body fat.
year: year the bat was sampled.
days: hibernation time, defined as days since the fall equinox.

Details

The complete dataset was collected by Cheng et al. (2019) in five different regions of the United States. bodyfat_Aeolus is the subset of the data collected in Aeolus Cave, located in East Dorset, Vermont, in the North Eastern United States.

Source

https://datadryad.org/stash/dataset/doi:10.5061/dryad.sh487nh

References

Cheng TL, Gerson A, Moore MS, et al. (2019) Higher fat stores contribute to persistence of little brown bat populations with white-nose syndrome. J Anim Ecol. 88:591-600. https://doi.org/10.1111/1365-2656.12954

Examples

data("bodyfat_Aeolus")
# Model with zeta = 2
fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit)
data("bodyfat_Aeolus")
# Model with zeta = 2
fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit)

Confidence Interval for the Skewness Parameter

Description

The CI.lambda function provides a plot of the profile (penalized) likelihood ratio statistics for lambda, useful to obtain confidence intervals for the skewness parameter (see Queiroz and Ferrari (2022)).

Usage

CI.lambda(object, conf.coef = 0.95, lower = NULL, upper = NULL)
CI.lambda(object, conf.coef = 0.95, lower = NULL, upper = NULL)

Arguments

`object`	fitted model object of class "`PLreg`".
`conf.coef`	confidence level of the confidence interval. Default is 0.95.
`lower`	a numeric value representing the lower limit of the interval for the skewness parameter. If `lower = NULL`, the lower limit is selected by the function.
`upper`	a numeric value representing the upper limit of the interval for the skewness parameter. If `upper = NULL`, the upper limit is selected by the function.

Value

The function returns a plot of the profile penalized likelihood ratio statistics for lambda with a horizontal dashed line, indicating the confidence interval for lambda. It also shows the confidence interval obtained.

Examples

data("PeruVotes")

fitPL <- PLreg(votes ~ HDI | HDI, 
  data = PeruVotes, 
  family = "TF", zeta = 5)

CI.lambda(fitPL)
data("PeruVotes")

fitPL <- PLreg(votes ~ HDI | HDI, 
  data = PeruVotes, 
  family = "TF", zeta = 5)

CI.lambda(fitPL)

Normal Probability Plots with Simulated Envelope of Residuals for PLreg Objects

Description

envelope is used to display normal probability plots with simulated envelope of residuals for the power logit models. Currently, three types of residuals are supported: quantile, deviance and standardized residuals.

Usage

envelope(
  object,
  type = c("quantile", "deviance", "standardized"),
  rep = 40,
  conf = 0.95,
  xlab,
  ylab,
  main,
  envcol,
  ylim,
  xlim
)
envelope(
  object,
  type = c("quantile", "deviance", "standardized"),
  rep = 40,
  conf = 0.95,
  xlab,
  ylab,
  main,
  envcol,
  ylim,
  xlim
)

Arguments

`object`	fitted model object of class "`PLreg`".
`type`	character specifying the type of residuals to be used, see `residuals.PLreg`. Default is `type = "standardized"`.
`rep`	a positive integer representing the number of iterations to calculate the simulated envelopes. Default is `rep=40`.
`conf`	a numeric value in the interval (0,1) that represents the confidence level of the simulated envelopes. Default is `conf=0.95`.
`xlab`	character specifying the label for $x$ axis (optional). Default is "`Quantile N(0,1)`".
`ylab`	character specifying the label for $y$ axis (optional). Default is the name of the used residual.
`main`	character specifying the overall title for the plot.
`envcol`	character specifying the color of the envelope.
`ylim`, `xlim`	numeric values, specifying the left/lower limit and the right/upper limit of the scale.

Details

The envelope uses the idea of Atkinson (1985) to create normal probability plots with simulated envelope. Under the correct model, approximately 100*conf of the residuals are expected to be inside the envelope.

Value

envelope returns normal probability plot with simulated envelopes for the residuals.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Atkinson, A. C. (1985) Plots, transformations and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford Science Publications, Oxford.

Examples

data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm,
              family = "SLASH", zeta = 2.13)
summary(fitPL)

envelope(fitPL, type = "standardized")
envelope(fitPL, type = "quantile")
envelope(fitPL, type = "deviance")

data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm,
              family = "SLASH", zeta = 2.13)
summary(fitPL)

envelope(fitPL, type = "standardized")
envelope(fitPL, type = "quantile")
envelope(fitPL, type = "deviance")

Procedure to Select the Extra Parameter for PLreg Objects

Description

The extra.parameter function is used to select the extra parameter of some power logit models. It provides plots of -2logLik and the Upsilon measure (see Queiroz and Ferrari (2022)) versus $\zeta$ , the extra parameter.

Usage

extra.parameter(object, lower, upper, grid = 10, graph = TRUE)
extra.parameter(object, lower, upper, grid = 10, graph = TRUE)

Arguments

`object`	fitted model object of class "`PLreg`".
`lower`	a numeric value representing the lower limit of the interval for the extra parameter.
`upper`	a numeric value representing the upper limit of the interval for the extra parameter.
`grid`	a positive integer representing the number of points in the plots. Default is `grid=10`. If `grid` is less than 10, then `grid=10`.
`graph`	logical. If `graph = TRUE` the plots are shown, if `graph = FALSE` the plots are not shown. Default is `graph = TRUE`.

Value

extra.parameter returns a list with five objects:

`zeta.Ups`	The selected zeta based on the Upsilon measure.
`zeta.loglik`	The selected zeta based on -2`logLik`.
`zeta.values`	The values of zeta used in the graphs.
`Upsilon.values`	-2`logLik` evaluated at each value of zeta.
`loglik.values`	Upsilon measure evaluated at each value of zeta.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Examples

data("bodyfat_Aeolus")

#Initial model with zeta = 2
fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit1)
# Choosing the best value for zeta

extra.parameter(fit1, lower = 1, upper = 4, grid = 15)
data("bodyfat_Aeolus")

#Initial model with zeta = 2
fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit1)
# Choosing the best value for zeta

extra.parameter(fit1, lower = 1, upper = 4, grid = 15)

Firm Cost

Description

A dataset on risk management practices of 73 firms.

Usage

data(Firm)
data(Firm)

Format

A data frame with 73 rows and 7 variables:

firmcost: total property and casualty premiums and uninsured losses as a percentage of total assets.
assume: per occurrence retention amount as a percentage of total assets.
cap: indicates that the firm owns a captive insurance company; 1 if the firm uses a captive, 0 otherwise.
sizelog: logarithm of total assets.
indcost: a measure of the firm's industry risk.
central: a measure of the importance of the local managers in choosing the amount of risk to be retained.
soph: a measure of the degree of importance in using analytical tools.

Details

The dataset was introduced and analyzed by Schmit and Roth (1990) and is available in the personal web page of Professor E. Frees (Wisconsin School of Business Research). The response variable is FIRMCOST, smaller values of firm cost are attributed to firms that have a good risk management performance.

Source

https://instruction.bus.wisc.edu/jfrees/jfreesbooks/Regression%20Modeling/BookWebDec2010/CSVData/RiskSurvey.csv

References

Schmit, J.T. and Roth, K. (1990). Cost effectiveness of risk management practices. Journal of Risk and Insurance. 57:455-470.

Examples

data("Firm")
fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm,
              family = "SLASH",
              zeta = 2.13)
summary(fitPL)
plot(fitPL, type = "standardized")
data("Firm")
fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm,
              family = "SLASH",
              zeta = 2.13)
summary(fitPL)
plot(fitPL, type = "standardized")

Influence Diagnostics for PLreg Objects

Description

The influence function provides two influence measures and the generalized leverage for power logit regression models.

Usage

influence(model, graph = TRUE, ...)
influence(model, graph = TRUE, ...)

Arguments

`model`	fitted model object of class "`PLreg`".
`graph`	logical. If `graph = TRUE` the plots are shown, if `graph = FALSE` the plots are not shown. Default is `graph = TRUE`.
`...`	currently not used.

Value

influence returns a list with three objects:

`case.weights`	The values of $h_{max}$ eigenvector based on case weights perturbation scheme (see Queiroz and Ferrari (2022)).
`totalLI`	The total local influence (see Lesaffre and Verbeke (1998))
`GL`	The diagonal elements of the generalized leverage matrix.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Examples

data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm, family = "SLASH", zeta = 2.13)

influence_measures = influence(fitPL, graph = FALSE)
plot(influence_measures$case.weights, type = "h", ylim = c(0,1))
plot(influence_measures$totalLI, type = "h", ylim = c(0,6))
plot(Firm$sizelog, influence_measures$GL, pch = "+")

data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm, family = "SLASH", zeta = 2.13)

influence_measures = influence(fitPL, graph = FALSE)
plot(influence_measures$case.weights, type = "h", ylim = c(0,1))
plot(influence_measures$totalLI, type = "h", ylim = c(0,6))
plot(Firm$sizelog, influence_measures$GL, pch = "+")

Methods for PLreg Objects

Description

Some S3 Methods for PLreg regression models.

Usage

## S3 method for class 'PLreg'
summary(object, type = "standardized", ...)

## S3 method for class 'PLreg'
print(x, ...)

## S3 method for class 'summary.PLreg'
print(x, ...)

## S3 method for class 'PLreg'
coef(object, ...)

## S3 method for class 'PLreg'
vcov(object, ...)

## S3 method for class 'PLreg'
logLik(object, ...)

## S3 method for class 'PLreg'
model.matrix(object, model = c("median", "dispersion"), ...)
## S3 method for class 'PLreg'
summary(object, type = "standardized", ...)

## S3 method for class 'PLreg'
print(x, ...)

## S3 method for class 'summary.PLreg'
print(x, ...)

## S3 method for class 'PLreg'
coef(object, ...)

## S3 method for class 'PLreg'
vcov(object, ...)

## S3 method for class 'PLreg'
logLik(object, ...)

## S3 method for class 'PLreg'
model.matrix(object, model = c("median", "dispersion"), ...)

Arguments

`object`, `x`	fitted model object of class "`PLreg`".
`type`	character specifying the type of residuals to be included in the summary output, see `residuals.PLreg`.
`...`	currently not used.
`model`	character specifying for which component of the model the coefficients/covariance are extracted.

Details

A set of methods for objects of class "PLreg", including methods for the functions summary and vcov, which print the estimated coefficients along with some other information and presents the covariance matrix, respectively. The summary also presents the partial Wald tests for the model parameters. Finally, summary returns an object of class "summary.PLreg" containing information to be printed using the print method.

Examples

data("bodyfat_Aeolus")

fitPL <- PLreg(percentfat ~ 1, data = bodyfat_Aeolus,
              family = "SN", zeta = 1.6)
fitPL
summary(fitPL)
coef(fitPL, model = "median")
vcov(fitPL)
logLik(fitPL)
AIC(fitPL)
data("bodyfat_Aeolus")

fitPL <- PLreg(percentfat ~ 1, data = bodyfat_Aeolus,
              family = "SN", zeta = 1.6)
fitPL
summary(fitPL)
coef(fitPL, model = "median")
vcov(fitPL)
logLik(fitPL)
AIC(fitPL)

Peru Blank Votes

Description

A dataset on the blank votes in the 2006 Peruvian general election.

Usage

data(PeruVotes)
data(PeruVotes)

Format

A data frame with 194 rows and 2 variables:

votes: proportion of blank votes in the 2006 Peruvian general election.
HDI: Human Development Index.

Details

The dataset was collected by Bayes et al. (2012) and the response variable is votes, proportion of blank votes in the 2006 Peruvian general election. Bayes et al. (2012) analyzed the influence of the Human Development Index (HDI) on the proportion of blank votes using a beta rectangular regression model. Lemonte and Bazan (2015) also analyzed the data using GJS regression models.

Source

https://www.undp.org/es/peru

References

Bayes, C., Bazan, J. L. and Garcia, C. (2012). A new robust regression model for proportions. Bayesian Analysis. 7:771-796

Lemonte, A. J. and Bazan, J. L. (2015). New class of Johnson SB distributions and its associated regression model for rates and proportions. Biometrical Journal. 58:727-746.

Examples

data("PeruVotes")
fitPL <- PLreg(votes ~ HDI | HDI,
               data = PeruVotes,
               family = "TF",
               zeta = 5,
               control = PLreg.control(lambda = 1))
summary(fitPL)
plot(fitPL, type = "standardized")
data("PeruVotes")
fitPL <- PLreg(votes ~ HDI | HDI,
               data = PeruVotes,
               family = "TF",
               zeta = 5,
               control = PLreg.control(lambda = 1))
summary(fitPL)
plot(fitPL, type = "standardized")

Power Logit Distributions

Description

Density, distribution function, quantile function and random generation for power logit distributions.

Usage

dPL(x, mu, sigma, lambda, zeta = 2, family, log = FALSE)

pPL(q, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE)

qPL(p, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE)

rPL(n, mu, sigma, lambda, zeta = 2, family)
dPL(x, mu, sigma, lambda, zeta = 2, family, log = FALSE)

pPL(q, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE)

qPL(p, mu, sigma, lambda, zeta = 2, family, lower.tail = TRUE, log.p = FALSE)

rPL(n, mu, sigma, lambda, zeta = 2, family)

Arguments

`x`, `q`	vector of quantiles.
`mu`	vector of medians.
`sigma`	vector of dispersion parameters.
`lambda`	vector of skewness parameters.
`zeta`	vector of extra parameters.
`family`	string that specifies the family used to define the power logit distribution. The family is `NO`, `TF`, `LO`, `PE`, `SHN`, `Hyp` and `SLASH` for normal, Student-t, type II logistic, power exponential, sinh-normal, hyperbolic and slash distribution, respectively.
`log`, `log.p`	logical; if TRUE, probabilities p are given as log(p). Default is FALSE.
`lower.tail`	logical; if TRUE (default), probabilities are $P(X \le x)$ otherwise, $P(X > x)$ .
`p`	vector of probabilities.
`n`	number of observations. If `length(n) > 1`, the length is taken to be the number required.

Details

If zeta is not specified, it assumes the default value 2.
The power logit distribution has density

$f(y; \mu, \sigma, \lambda) = \lambda r(z^2)/(\sigma y (1-y^\lambda)),$

for $y \in (0,1)$ , in which $z = [logit(y^\lambda) - logit(\mu^\lambda)]/\sigma$ , $r(\cdot)$ is the density generator and $\mu \in (0,1)$ , $\sigma>0$ and $\lambda>0$ are the median, dispersion and skewness of the distribution.
It is possible to consider $\lambda=0$ to obtain the limiting case, the log-log distribution. This distribution has density

$f(y; \mu, \sigma, \lambda) = r(z^2)/(\sigma y (-log(y))),$

for $y \in (0,1)$ , in which $z = [-log(-log(y)) - (-log(-log(y)))]/\sigma$ .
The family argument defines the density generator $r(\cdot)$ , which may depend on an extra parameter (zeta).

Value

dPL gives the density, pPL gives the distribution function, qPL gives the quantile function, and rPL generates random variables.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv preprint arXiv:2202.01697.

Examples

dPL(0.2, mu = 0.3, sigma = 1, lambda=1, zeta = 2, family = "PE")
mu = 0.3; sigma = 1; lambda = 2
set.seed(1)
PLsample = rPL(1000, mu, sigma, lambda, family = "SN",  zeta = 2.5)
hist(PLsample, prob = TRUE, breaks = 15, main = "", las = 1)
curve(dPL(x, mu, sigma, lambda, family = "SN",  zeta = 2.5),
      from = 0.01, to = 0.8, add = TRUE, col = "red")
rug(PLsample)

x = seq(0.01, 0.9,0.01)
y = dPL(x, mu, sigma, lambda, family = "Hyp",  zeta = 2)
plot(x, y, type = "l", lwd = 2, las = 1)

x1 = seq(0.01, 0.4, 0.01)
y1 = dPL(x1, mu, sigma, lambda, family = "Hyp",  zeta = 2)
polygon(c(x1, 0.4, 0), c(y1, 0, 0), col = "lightblue")
text(mu-0.025, 1, paste("P(Y<0.4) = ",
                        round(pPL(0.4, mu, sigma, lambda,
                        family = "Hyp",
                        zeta = 2),2)),
     font = 1, cex = 0.8)

plot(x, pPL(x, mu, sigma, lambda, family = "PE",  zeta = 1.3),
     type = "l", las = 1, lwd = 2,
     ylab = expression(P(Y<y)),
     xlab = "y")
p = pPL(0.5, mu, sigma, lambda, family = "PE",  zeta = 1.3)
q = qPL(p, mu, sigma, lambda, family = "PE",  zeta = 1.3)
points(q, p, pch = 16, col = 2, cex = 1.5)
text(0.55, 0.83, paste("(", 0.5, ",", round(p, 2), ")"), font = 2,
     cex = 0.8, col = "red")
dPL(0.2, mu = 0.3, sigma = 1, lambda=1, zeta = 2, family = "PE")
mu = 0.3; sigma = 1; lambda = 2
set.seed(1)
PLsample = rPL(1000, mu, sigma, lambda, family = "SN",  zeta = 2.5)
hist(PLsample, prob = TRUE, breaks = 15, main = "", las = 1)
curve(dPL(x, mu, sigma, lambda, family = "SN",  zeta = 2.5),
      from = 0.01, to = 0.8, add = TRUE, col = "red")
rug(PLsample)

x = seq(0.01, 0.9,0.01)
y = dPL(x, mu, sigma, lambda, family = "Hyp",  zeta = 2)
plot(x, y, type = "l", lwd = 2, las = 1)

x1 = seq(0.01, 0.4, 0.01)
y1 = dPL(x1, mu, sigma, lambda, family = "Hyp",  zeta = 2)
polygon(c(x1, 0.4, 0), c(y1, 0, 0), col = "lightblue")
text(mu-0.025, 1, paste("P(Y<0.4) = ",
                        round(pPL(0.4, mu, sigma, lambda,
                        family = "Hyp",
                        zeta = 2),2)),
     font = 1, cex = 0.8)

plot(x, pPL(x, mu, sigma, lambda, family = "PE",  zeta = 1.3),
     type = "l", las = 1, lwd = 2,
     ylab = expression(P(Y<y)),
     xlab = "y")
p = pPL(0.5, mu, sigma, lambda, family = "PE",  zeta = 1.3)
q = qPL(p, mu, sigma, lambda, family = "PE",  zeta = 1.3)
points(q, p, pch = 16, col = 2, cex = 1.5)
text(0.55, 0.83, paste("(", 0.5, ",", round(p, 2), ")"), font = 2,
     cex = 0.8, col = "red")

Diagnostic Plots for PLreg Objects

Description

This function provides plots for diagnostic analysis of power logit regression models.

Usage

## S3 method for class 'PLreg'
plot(x, which = 1:4, type = "standardized", pch = "+", las = 1, cex = 0.8, ...)
## S3 method for class 'PLreg'
plot(x, which = 1:4, type = "standardized", pch = "+", las = 1, cex = 0.8, ...)

Arguments

`x`	fitted model object of class "`PLreg`".
`which`	numeric specifying a subset of plots (numbers between 1 and 7). Default is `which = 1:4`.
`type`	character specifying the type of residuals, see `residuals.PLreg`. Default is `type = "standardized"`.
`pch`, `las`, `cex`, `...`	graphical parameters (see `par`)

Details

The plot method for PLreg objects provides 7 types of diagnostic plots in the following order.

Residuals vs indexes of obs.: An index plot of the residuals versus indexes of observations.
Case-weight perturbation: An index plot of local influence based on the case-weight perturbation scheme.
Generalized leverage: A dispersion diagram of the generalized leverage versus the predicted values.
Residuals vs linear predictor: A dispersion diagram of the residuals versus the linear predictors.
Normal probability plot: A normal probability plot of the residuals.
Predicted vs observed values: A dispersion diagram of the predicted values versus the observed values.
Residuals vs v(z) function: A dispersion diagram of the $v(z)$ function versus the residuals. For some power logit models, the $v(z)$ function may be interpreted as weights in the estimation process. If family = "NO", the $v(z)$ function is constant.

The which argument can be used to select a subset of the implemented plots. Default is which = 1:4.

Value

plot method for PLreg objects returns 7 types of diagnostic plots.

Examples

data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm,
              family = "SLASH", zeta = 2.13)
par(mfrow = c(3,3))
plot(fitPL, type = "standardized")
par(mfrow = c(1, 1))
data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost, data = Firm,
              family = "SLASH", zeta = 2.13)
par(mfrow = c(3,3))
plot(fitPL, type = "standardized")
par(mfrow = c(1, 1))

Power Logit Regression Models for Bounded Variables

Description

PLreg is used to fit power logit regression model for continuous and bounded variables via maximum likelihood approach. Both median and dispersion of the response variable are modeled through parametric functions.

Usage

PLreg(
  formula,
  data,
  subset,
  na.action,
  family = c("NO", "LO", "TF", "PE", "SN", "SLASH", "Hyp"),
  zeta = NULL,
  link = c("logit", "probit", "cloglog", "cauchit", "loglog"),
  link.sigma = NULL,
  type = c("pML", "ML"),
  control = PLreg.control(...),
  model = TRUE,
  y = TRUE,
  x = FALSE,
  ...
)

PLreg.fit(
  X,
  y,
  S = NULL,
  family,
  type = "pML",
  zeta = zeta,
  link = "logit",
  link.sigma = "log",
  control = PLreg.control()
)
PLreg(
  formula,
  data,
  subset,
  na.action,
  family = c("NO", "LO", "TF", "PE", "SN", "SLASH", "Hyp"),
  zeta = NULL,
  link = c("logit", "probit", "cloglog", "cauchit", "loglog"),
  link.sigma = NULL,
  type = c("pML", "ML"),
  control = PLreg.control(...),
  model = TRUE,
  y = TRUE,
  x = FALSE,
  ...
)

PLreg.fit(
  X,
  y,
  S = NULL,
  family,
  type = "pML",
  zeta = zeta,
  link = "logit",
  link.sigma = "log",
  control = PLreg.control()
)

Arguments

`formula`	a symbolic description of the model. See details for further information.
`data`, `subset`, `na.action`	arguments controlling formula processing via `model.frame`.
`family`	a description of the symmetric distribution to be used for generating the power logit model. Supported families include "`NO`", "`LO`", "`TF`", "`PE`", "`Hyp`", `SHN`" and "`SLASH`", which correspond to the power logit normal, type II logistic, Student-t, power exponential, hyperbolic, sinh-normal, and slash distributions, respectively.
`zeta`	a numeric value or numeric vector that represents the extra parameter of the distribution. For the PL-NO and PL-LO models, no extra parameter is needed.
`link`	an optional character that specifies the link function of the median submodel (mu). The "`logit`", "`probit`", "`cloglog`", "`cauchit`", "`loglog`" functions are supported. The `logit` function is the default.
`link.sigma`	an optional character that specifies the link function of the dispersion submodel (sigma). The "`log`", "`sqrt`" functions are supported. The default is `log`.
`type`	character specifying the type of estimator for the skewness parameter. Currently, penalized maximum likelihood ("`pML`") and maximum likelihood ("`ML`") are supported. If the skewness parameter is fixed, `ML` type is used.
`control`	a list of control arguments specified via `PLreg.control`.
`model`, `y`, `x`	logicals. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned. For `PLreg.fit`, `y` must be the numeric response vector (with values in (0,1)).
`...`	arguments passed to `PLreg.control`.
`X`	numeric regressor matrix for the median submodel.
`S`	numeric regressor matrix for the dispersion submodel.

Details

The power logit regression models, proposed by Queiroz and Ferrari (2022), is useful in situations when the response variable is continuous and bounded on the unit interval (0, 1). The median and the dispersion parameters are modeled through parametric link functions. The models depend on a skewness parameter (called $\lambda$ ). When the skewness parameter is fixed and equal to 1, the power logit models coincide with the GJS regression models (Lemonte and Bazan, 2016). Queiroz and Ferrari (2022) suggest using a penalized maximum likelihood method to estimate the parameters. This method is implemented in PLreg by default when $\lambda$ is not fixed. If convergence is not reached, maximum likelihood estimation is performed. The estimation process uses optim. If no starting values are specified, the PLreg function uses those suggested by Queiroz and Ferrari (2022). This function also fits the log-log regression models by setting $\lambda$ at zero ( $\lambda = 0$ represents $\lambda \rightarrow 0^+$ ).

The formulation of the model has the same structure as in the usual functions lm and glm. The argument formula could comprise of three parts (separated by the symbols " $~$ " and " $|$ "), namely: observed response variable in the unit interval, predictor of the median submodel, with link function link and predictor of the dispersion submodel, with link.sigma link function. If the model has constant dispersion, the third part may be omitted and the link function for sigma is "log" by default. The skewness parameter lambda may be treated as fixed or not (default). If lambda is fixed, its value must be specified in the control argument.

Some methods are available for objects of class "PLreg", see plot.PLreg, summary.PLreg, coef.PLreg, vcov.PLreg, and residuals.PLreg, for details and other methods.

Value

PLreg returns an object of class "PLreg" with the following components (the PLreg.fit returns elements up to v).

`coefficients`	a list with the "`median`", "`dispersion`" and "`skewness`" (if `lambda = NULL`) coefficients.
`residuals`	a vector of the raw residuals (the difference between the observed and the fitted response).
`fitted.values`	a vector with the fitted values of the median submodel.
`optim`	a list with the output from `optim`. When lambda is not fixed, if `type = "pML"`, the output refers to the iterative process of the median and dispersion parameters only and, if `type = "ML"`, on the maximization of the likelihood for all the parameters.
`family`	a character specifying the `family` used.
`method`	the method argument passed to the optim call.
`control`	the control arguments passed to the optim call.
`start`	a vector with the starting values used in the iterative process.
`nobs`	number of observations.
`df.null`	residual degrees of freedom in the null model (constant median and dispersion), i.e., $n-3$ .
`df.residual`	residual degrees of freedom in the fitted model.
`lambda`	value of the skewness parameter lambda (`NULL` when lambda is not fixed).
`loglik`	log-likelihood of the fitted model.
`loglikp`	penalized profile log-likelihood for lambda. If lambda is equal to zero, `loglikp` returns 1.
`vcov`	covariance matrix of all the parameters.
`pseudo.r.squared`	pseudo R-squared value.
`Upsilon.zeta`	an overall goodness-of-fit measure.
`link`	a list with elements "`median`" and "`dispersion`" containing the link objects for the respective models.
`converged`	logical indicating successful convergence of the iterative process.
`zeta`	a numeric specifying the value of zeta used in the estimation process.
`type`	a character specifying the estimation method used.
`v`	a vector with the v(z) values for all the observations (see Queiroz and Ferrari(2021)).
`call`	the original function call.
`formula`	the formula used.
`terms`	a list with elements "`median`", "`dispersion`" and "`full`" containing the term objects for the respective models.
`levels`	a list with elements "`median`", "`dispersion`" and "`full`" containing the levels of the categorical regressors.
`contrasts`	a list with elements "`median`" and "`dispersion`" containing the contrasts corresponding to levels from the respective models.
`model`	the full model frame (if `y = TRUE`).
`y`	the response variable (if `y = TRUE`).
`x`	a list with elements "`median`" and "`dispersion`" with the matrices from the median and dispersion submodels (if `x = TRUE`).

Author(s)

Francisco Felipe de Queiroz ([email protected]) and Silvia L. P. Ferrari.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Lemonte, A. J. and Bazan, J. L. (2015). New class of Johnson SB distributions and its associated regression model for rates and proportions. Biometrical Journal. 58:727-746.

Examples

#### Body fat data
data("bodyfat_Aeolus")

#Initial model with zeta = 2
fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit1)
# Choosing the best value for zeta
# extra.parameter(fit1, lower = 1, upper = 4, grid = 15)

# Using zeta = 1.7
fit2 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7)
summary(fit2)

# Fixing lambda = 1
fit3 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7,
             control = PLreg.control(lambda = 1))
summary(fit3)

# Comparing the AIC and Upsilon values between fit2 and fit3
AIC(fit2) < AIC(fit3) # TRUE
fit2$Upsilon.zeta < fit3$Upsilon.zeta #TRUE

#### Firm cost data
data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm,
              family = "SLASH",
              zeta = 2.13)
summary(fitPL)
#extra.parameter(fitPL, lower = 1.2, upper = 4, grid = 10)
#plot(fitPL, type = "standardized")
#envelope(fitPL, type = "standardized")

fitPL_wo72 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-72,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo15 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-15,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo16 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-16,],
                   family = "SLASH",
                   zeta = 2.13)

coef.mu      <- coef(fitPL)[1:3]
coef.mu_wo72 <- coef(fitPL_wo72)[1:3]
coef.mu_wo15 <- coef(fitPL_wo15)[1:3]
coef.mu_wo16 <- coef(fitPL_wo16)[1:3]

plot(Firm$indcost, Firm$firmcost,
    pch = "+",
    xlab = "indcost",
    ylab = "firmcost")
#identify(Firm$indcost, Firm$firmcost)
covariate = matrix(c(rep.int(1, 1000),
                    rep(median(Firm$sizelog), 1000),
                    seq(0, 1.22, length.out = 1000)),
                  ncol = 3)
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu)),
     type = "l")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo72)),
     type = "l", lty = 2, col = "blue")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo15)),
     type = "l", lty = 3, col = "red")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo16)),
     type = "l", lty = 4, col = "green")
parameters = c("pML",
              "pML w/o 72",
              "pML w/o 15",
              "pML w/o 16")
legend(x = 0.5,
      y = 0.8,
      legend = parameters,
      col = c("black", "blue", "red", "green"),
      lty = c(1, 2, 3, 4),
      cex = 0.6)

#### Body fat data
data("bodyfat_Aeolus")

#Initial model with zeta = 2
fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit1)
# Choosing the best value for zeta
# extra.parameter(fit1, lower = 1, upper = 4, grid = 15)

# Using zeta = 1.7
fit2 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7)
summary(fit2)

# Fixing lambda = 1
fit3 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7,
             control = PLreg.control(lambda = 1))
summary(fit3)

# Comparing the AIC and Upsilon values between fit2 and fit3
AIC(fit2) < AIC(fit3) # TRUE
fit2$Upsilon.zeta < fit3$Upsilon.zeta #TRUE

#### Firm cost data
data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm,
              family = "SLASH",
              zeta = 2.13)
summary(fitPL)
#extra.parameter(fitPL, lower = 1.2, upper = 4, grid = 10)
#plot(fitPL, type = "standardized")
#envelope(fitPL, type = "standardized")

fitPL_wo72 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-72,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo15 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-15,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo16 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-16,],
                   family = "SLASH",
                   zeta = 2.13)

coef.mu      <- coef(fitPL)[1:3]
coef.mu_wo72 <- coef(fitPL_wo72)[1:3]
coef.mu_wo15 <- coef(fitPL_wo15)[1:3]
coef.mu_wo16 <- coef(fitPL_wo16)[1:3]

plot(Firm$indcost, Firm$firmcost,
    pch = "+",
    xlab = "indcost",
    ylab = "firmcost")
#identify(Firm$indcost, Firm$firmcost)
covariate = matrix(c(rep.int(1, 1000),
                    rep(median(Firm$sizelog), 1000),
                    seq(0, 1.22, length.out = 1000)),
                  ncol = 3)
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu)),
     type = "l")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo72)),
     type = "l", lty = 2, col = "blue")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo15)),
     type = "l", lty = 3, col = "red")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo16)),
     type = "l", lty = 4, col = "green")
parameters = c("pML",
              "pML w/o 72",
              "pML w/o 15",
              "pML w/o 16")
legend(x = 0.5,
      y = 0.8,
      legend = parameters,
      col = c("black", "blue", "red", "green"),
      lty = c(1, 2, 3, 4),
      cex = 0.6)

Auxiliary for Controlling PL Fitting

Description

Parameters that control fitting of power logit regression models using PLreg.

Usage

PLreg.control(
  lambda = NULL,
  method = "BFGS",
  maxit = 2000,
  trace = FALSE,
  start = NULL,
  ...
)
PLreg.control(
  lambda = NULL,
  method = "BFGS",
  maxit = 2000,
  trace = FALSE,
  start = NULL,
  ...
)

Arguments

`lambda`	numeric indicating the value of the skewness parameter lambda (if `NULL`, lambda will be estimated).
`method`	character specifying the `method` argument passed to `optim`.
`maxit`, `trace`, `...`	arguments passed to `optim`
`start`	an optional vector with starting values for median and dispersion submodels (starting value for lambda must not be included).

Details

The PLreg.control controls the fitting process of power logit models. Almost all the arguments are passed on directly to optim, which is used to estimate the parameters. Starting values for median and dispersion submodels may be supplied via start. If the estimation process is to be performed with a fixed skewness parameter, a value must be specified in lambda. If lambda = 0, a log-log regression model will be estimated.

Value

A list with components named as the arguments.

Examples

data("PeruVotes")

fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes,
              family = "TF", zeta = 5, control = PLreg.control(lambda = 1))
summary(fitPL)

data("PeruVotes")

fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes,
              family = "TF", zeta = 5, control = PLreg.control(lambda = 1))
summary(fitPL)

Residuals Method for PLreg Objects

Description

The function provides three types of residuals for power logit regression models: quantile, deviance and standardized.

Usage

## S3 method for class 'PLreg'
residuals(object, type = c("quantile", "deviance", "standardized"), ...)
## S3 method for class 'PLreg'
residuals(object, type = c("quantile", "deviance", "standardized"), ...)

Arguments

`object`	fitted model object of class "`PLreg`".
`type`	character specifying the type of residuals to be used.
`...`	currently not used.

Details

The quantile residuals is based on Dunn and Smyth (1996) idea. The residuals are well-defined for all the distributions in the power logit class and have, approximately, a standard normal distribution in large samples if the model is correctly specified.

The deviance residuals are based on the log-likelihood contributions of each observation in the sample. The distribution of this residual is unknown (both exact and asymptotic), except for the power logit normal model, which is, approximately, standard normal.

The standardized residuals are a standardized form of the ordinary residual. These residuals take into account the diagonal elements of the $H$ matrix, being useful for detecting leverage observations. The distribution of the standardized residuals is unknown.

Value

residuals method for object of class "PLreg" returns a vector with the residuals of the type specified in the type argument.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Dunn, P. K. and Smyth, G. K. (1996) Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5:236-244.

Examples

data("PeruVotes")
fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes,
              family = "TF", zeta = 5)

res_quantile = residuals(fitPL, type = "quantile")
res_standardized = residuals(fitPL, type = "standardized")

plot(res_standardized, pch = "+", ylim = c(-6, 6))
abline(h = -3, lty = 2)
abline(h = 3, lty = 2)

qqnorm(res_quantile)
qqline(res_quantile)
data("PeruVotes")
fitPL <- PLreg(votes ~ HDI | HDI, data = PeruVotes,
              family = "TF", zeta = 5)

res_quantile = residuals(fitPL, type = "quantile")
res_standardized = residuals(fitPL, type = "standardized")

plot(res_standardized, pch = "+", ylim = c(-6, 6))
abline(h = -3, lty = 2)
abline(h = 3, lty = 2)

qqnorm(res_quantile)
qqline(res_quantile)

Sandwich Variance and Covariance Matrix for PLreg Objects

Description

The sandwich function provides an estimate for the asymptotic variance and covariance matrix of the parameter estimators of the power logit (or log-log) regression models based on de sandwich estimator (see Queiroz and Ferrari (2022)).

Usage

sandwich(object)
sandwich(object)

Arguments

object

fitted model object of class "PLreg".

Value

extra.parameter returns a matrix containing the sandwich variance and covariance matrix estimate.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Examples

data("Firm")

fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
sandwich(fit)
data("Firm")

fit <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
sandwich(fit)

Package 'PLreg'

Help Index

Body Fat of Little Brown Bat

Description

Usage

Format

Details

Source

References

Examples

Confidence Interval for the Skewness Parameter

Description

Usage

Arguments

Value

Examples

Normal Probability Plots with Simulated Envelope of Residuals for PLreg Objects

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Procedure to Select the Extra Parameter for PLreg Objects

Description

Usage

Arguments

Value

References

See Also

Examples

Firm Cost

Description

Usage

Format

Details

Source

References

Examples

Influence Diagnostics for PLreg Objects

Description

Usage

Arguments

Value

References

See Also

Examples

Methods for PLreg Objects

Description

Usage

Arguments

Details

See Also

Examples

Peru Blank Votes

Description

Usage

Format

Details

Source

References

Examples

Power Logit Distributions

Description

Usage

Arguments

Details

Value

References

Examples

Diagnostic Plots for PLreg Objects

Description

Usage

Arguments

Details

Value

See Also

Examples