Calculate Multiple Definitions of Coefficient of Determination (R-squared)
Source:R/r2.R
r2_kvr2.RdCalculates nine types of coefficients of determination (\(R^2\)) based on the classification by Kvalseth (1985). This function is designed to demonstrate how \(R^2\) values can vary depending on their mathematical definition, particularly in models without an intercept or in power regression models
Usage
r2(model, type = c("auto", "liner", "power"), adjusted = FALSE)
r2_1(model, type = c("auto", "liner", "power"))
r2_2(model, type = c("auto", "liner", "power"))
r2_3(model, type = c("auto", "liner", "power"))
r2_4(model, type = c("auto", "liner", "power"))
r2_5(model, type = c("auto", "liner", "power"))
r2_6(model, type = c("auto", "liner", "power"))
r2_7(model, type = c("auto", "liner", "power"))
r2_8(model, type = c("auto", "liner", "power"))
r2_9(model, type = c("auto", "liner", "power"))Arguments
- model
A linear model or power regression model of the
lm.- type
Character string. Selects the model type:
"linear","power", or"auto"(default). In"auto", the function detects if the dependent variable is log-transformed.- adjusted
Logical. If
TRUE, calculates the adjusted coefficient of determination for each formula.
Value
An object of class r2_kvr2, which is a list containing the
calculated values for each \(R^2\) formula.
Details
The nine coefficient equations from \(R^2_1\) to \(R^2_9\) are based on Kvalseth (1985) and are as follows:
\(R^2_1\): Proportion of total variance explained. $$R_1^2 = 1 - \frac{\sum(y - \hat{y})^2}{\sum(y - \bar{y})^2}$$
\(R^2_2\): Based on the variation of predicted values. $$R^2_2 = \frac{\sum(\hat{y} - \bar{y})^2}{\sum(y - \bar{y})^2}$$
\(R^2_3\): Ratio of variation using the mean of predicted values. $$R_3^2 = \frac{\sum(\hat{y} - \bar{\hat{y}})^2}{\sum(y - \bar{y})^2}$$
\(R^2_4\): Incorporates the mean residual. $$R_4^2 = 1 - \frac{\sum(e - \bar{e})^2}{\sum(y - \bar{y})^2}, \quad e = y - \hat{y}$$
\(R^2_5\): The square of the multiple correlation coefficient between the dependent variable and the independent variable (a comprehensive indicator in linearized models). $$R_5^2 = \text{squared multiple correlation coefficient between the regressand and the regressors}$$
\(R^2_6\): Square of Pearson's correlation coefficient between observed \(y\) and predicted \(\hat{y}\). $$R_6^2 = \frac{\left( \sum(y - \bar{y})(\hat{y} - \bar{\hat{y}}) \right)^2}{\sum(y - \bar{y})^2 \sum(\hat{y} - \bar{\hat{y}})^2}$$
\(R^2_7\): Recommended for models without an intercept. $$R_7^2 = 1 - \frac{\sum(y - \hat{y})^2}{\sum y^2}$$
\(R^2_8\): Alternative form for models without an intercept. $$R_8^2 = \frac{\sum \hat{y}^2}{\sum y^2}$$
\(R^2_9\): Robust version using medians to resist outliers. $$R_9^2 = 1 - \left( \frac{M\{|y_i - \hat{y}_i|\}}{M\{|y_i - \bar{y}|\}} \right)^2$$
where \(M\) represents the median of the sample.
For degree of freedom adjustment adjusted = TRUE, refer to r2_adjusted.
Note
The power regression model must be based on a logarithmic transformation.
The auto-selection between linear regression and power regression models is determined by whether the dependent variable's name contains “log”. If the name “log” is intentionally used for a linear regression model, the selection cannot be made correctly.
References
Tarald O. Kvalseth (1985) Cautionary Note about R 2 , The American Statistician, 39:4, 279-285, doi:10.1080/00031305.1985.10479448
Box, George E. P., Hunter, William G., Hunter, J. Stuart. (1978) Statistics for experimenters: an introduction to design, data analysis, and model building. New York, United States, J. Wiley, p. 462-473, ISBN:9780471093152.
Examples
# Example data set 1. Kvalseth (1985).
df1 <- data.frame(x = c(1:6),
y = c(15,37,52,59,83,92))
# Linear regression model with intercept
model_intercept1 <- lm(y ~ x, df1)
# Linear regression model without intercept
model_without1 <- lm(y ~ x - 1, df1)
# Power regression model
model_power1 <- lm(log(y) ~ log(x), df1)
r2(model_intercept1)
#> R2_1 : 0.9808
#> R2_2 : 0.9808
#> R2_3 : 0.9808
#> R2_4 : 0.9808
#> R2_5 : 0.9808
#> R2_6 : 0.9808
#> R2_7 : 0.9966
#> R2_8 : 0.9966
#> R2_9 : 0.9778
r2(model_without1)
#> R2_1 : 0.9777
#> R2_2 : 1.0836
#> R2_3 : 1.0830
#> R2_4 : 0.9783
#> R2_5 : 0.9808
#> R2_6 : 0.9808
#> R2_7 : 0.9961
#> R2_8 : 0.9961
#> R2_9 : 0.9717
r2(model_power1)
#> R2_1 : 0.9777
#> R2_2 : 1.0984
#> R2_3 : 1.0983
#> R2_4 : 0.9778
#> R2_5 : 0.9816
#> R2_6 : 0.9811
#> R2_7 : 0.9961
#> R2_8 : 1.0232
#> R2_9 : 0.9706
# Example data set 2. Kvalseth (1985).
df2 <- data.frame(x = 6:13,
y = c(3882, 1266, 733, 450, 410, 305, 185, 112))
power_model2 <- lm(log((y/7343)) ~ log(x), data = df2)
r2(power_model2)
#> R2_1 : 0.9019
#> R2_2 : 0.5858
#> R2_3 : 0.5825
#> R2_4 : 0.9051
#> R2_5 : 0.9668
#> R2_6 : 0.9498
#> R2_7 : 0.9392
#> R2_8 : 0.6879
#> R2_9 : 0.9782
# Example of a Multiple Regression Analysis Model.
# The data for two independent variables given by Box et al. (1978, p. 462)
# as used in Kvalseth (1985).
df3 <- data.frame(x1 = c(0.34, 0.34, 0.58, 1.26, 1.26, 1.82),
x2 = c(0.73, 0.73, 0.69, 0.97, 0.97, 0.46),
y = c(5.75, 4.79, 5.44, 9.09, 8.59, 5.09))
# Multiple regression analysis model with intercept
model_intercept3 <- lm(y ~ x1 + x2, df3)
# Multiple regression analysis model without intercept
model_without3 <- lm(y ~ x1 + x2 - 1, df3)
# Multiple power regression analysis model
model_power3 <- lm(log(y) ~ log(x1) + log(x2), df3)
r2(model_intercept3)
#> R2_1 : 0.9657
#> R2_2 : 0.9657
#> R2_3 : 0.9657
#> R2_4 : 0.9657
#> R2_5 : 0.9657
#> R2_6 : 0.9657
#> R2_7 : 0.9977
#> R2_8 : 0.9977
#> R2_9 : 0.9729
r2(model_without3)
#> R2_1 : 0.9247
#> R2_2 : 0.6169
#> R2_3 : 0.6153
#> R2_4 : 0.9263
#> R2_5 : 0.9657
#> R2_6 : 0.9656
#> R2_7 : 0.9950
#> R2_8 : 0.9950
#> R2_9 : 0.9661
r2(model_power3)
#> R2_1 : 0.9653
#> R2_2 : 0.9639
#> R2_3 : 0.9638
#> R2_4 : 0.9653
#> R2_5 : 0.9500
#> R2_6 : 0.9653
#> R2_7 : 0.9977
#> R2_8 : 0.9949
#> R2_9 : 0.9729