<-- home

R Squared

评价回归算法 R Squared

\[RMSE = \sqrt{\frac{1}{m}\sum_{i=0}^m(y_{test}^{(i)} - \hat{y}_{test}^{(i)})^2}\] \[MAE = \frac{1}{m}\sum_{i=0}^m|y_{test}^{(i)} - \hat{y}_{test}^{(i)}|\]

问题:分类准确度问题(1最好, 0最差)

引入R Squared

\[R^2 = 1 - \frac{SS_{residual}}{SS_{total}}\qquad \frac{(Residual\,Sum\,of\,Squares)} {(Total\,Sum\,of\,Squares)}\] \[R^2 = 1-\frac{\sum_i(\hat{y}^{(i)} - y^{(i)})^2}{\sum_i(\bar{y}^{(i)} - y^{(i)})^2} = 1 - \frac{(\sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2) / m}{(\sum_{i=0}^m(y^{(i)} - \bar{y}^{(i)})^2) / m} = 1 - \frac{MSE(\hat{y},y)}{Var(y)}\]

Var(y) 代表方差

公式上部分可以看做使用我们模型产生的错误

公式下部分可以看做不考虑x使用平均值作为预测结果产生的错误(Baseline model)

  • R Squared <= 1
  • R Squared 越大越好。当我们模型不犯错误,则得到最大值1
  • 当我们模型等于基准模型(Baseline model)时, R Squared = 0
  • 如果 R Squared < 0 ,说明我们的模型还不如基准模型。此时,很可能我们的数据不具备任何线性关系

R Square 计算

import numpy as np
from sklearn import datasets
from sklearn.metrics import mean_squared_error, mean_absolute_error
boston = datasets.load_boston()
x = boston.data[:,5]
y = boston.target
x = x[y < 50]
y = y[y < 50]
x_train, x_test, y_train, y_test = train_test_split(x, y, seed=666)
slr = SimpleLinearRegressionV2()
slr.fit(x_train, y_train)
y_predict = slr.predict(x_test)
r2 = 1 - mean_squared_error(y_test, y_predict) / np.var(y_test)
r2

0.6129316803937322

def r2_score(y_true, y_predict):
    return 1 - mean_squared_error(y_true, y_predict) / np.var(y_true)
r2_score(y_test, y_predict)

0.6129316803937322

from sklearn.metrics import r2_score
r2_score(y_test, y_predict)

0.6129316803937324

slr.score(x_test, y_test)

0.6129316803937322