Multiple Regression in Go
Linear Regression predicts one dependent variable (y) given one or more independent variables (x1, x2, x3...xn). Simple Linear regression with a single independent variable is easy to implement thanks to convenient closed form solution.
Multiple regression also has a closed form solution using matrices and requires computing a matrix inverse which is often impossible[1].
The Closed form Solution
The closed form of multiple regression solves k partial differential equations using matrices and requires a matrix inverse. Lets begin with the matrix representation of a linear model in k dimensions with n observations in the training data:
\[\hat{Y} = X\beta\] $\hat{Y}$ is a n x 1 column vector$X$ is a n x k matrix
$\beta$ is a k x 1 column vector
With the matrix representation, the closed form for multiple regression is
\[\beta = (X^TX)^{-1} X^TY\]The proof of this expression is instructive but ultimately unhelpful because it requires a matrix inverse. The matrix inverse is undefined for non-square matrices. It is also undefined for square matrices where the determinant is zero.
In practice the optimal solution for Multiple Regression is computed using Gradient Descent.
Gradient Descent algorithm for Multiple Regression
The gradient descent algorithm begins by initializing all model coefficients to zero. It then iteratively calculates the partial derivatives of the cost function with respect to each coefficient, determining the direction and magnitude of the steepest increase in cost. The coefficients are updated at each iteration by moving in the opposite direction of these gradients, scaled by a small learning rate parameter that controls the step size. This iterative optimization process repeats, gradually minimizing the cost function, until either a predetermined number of iterations is reached or the magnitude of the gradient falls below a specified threshold, indicating convergence to a local minimum.
The cost function for Multiple Regression is
\[\frac{1}{2n}{\parallel Y - X\beta \parallel} ^2\]The first derivative of the cost function is
\[ \frac{1}{n} X^T (\hat{Y}-Y)\] \[ \frac{1}{n} X^T (X\beta-Y)\]We can now write the steps in pseudo code as follows:
// Start with initial values for the model coefficients
// beta is a k x 1 column vector
beta = 0
for i in 0..iterations:
// Step 1: compute the predicted values for the model with parameters beta
// Y_hat is an n x 1 column vector of predictions
// X is a n x k matrix of feature values
// beta is k x 1 column vector
Y_hat = X x beta
// Step 2: Calculate the error in the predictions
// Y is an n x 1 column vector of observations
Y_err = Y_hat - Y
// Step 3: Calculate the first derivative of the cost function (the gradient)
d_cost = (1/n)matmul(X_transpose, Y_err)
//Step 4: Calculate the step size
step_size = learning_rate * d_cost
//Step 5: Calculate the new values for the model parameters
beta = beta - step_size
With that clarity the go implementation for multiple regression using gradient descent is easy.
Multiple Regression in Go
We'll use the gonum
library for matrix and vector operations and create a struct - MultipleRegression
- and implement two methods on it: Train()
and Predict()
. Train()
will be used to train the regression model and Predict()
will be used to make predictions once training is complete.
Lets begin by adding the gonum package:
go get gonum.org/v1/gonum
Next lets create a struct named MultipleRegression and add a constructor
// Multiple Regression using Gradient Descent
//
// n observations
// k parameters (k-1 features)
//
// y_hat = Vector of Predicted values from the model (shape: n x 1)
// beta = Vector of Parameters (shape: k x 1)
// y = Vector of observed values (shape: n x 1)
// x = Matrix of Independent Variables (shape: n x k)
// iter = number of iterations of Gradient Descent
// lr = Learning rate
type MultipleRegression struct {
// Column Vector of observed values (shape: n x 1)
y *mat.VecDense
// Matrix of k independent variables and n observations (shape: n x k)
x *mat.Dense
// Number of observations
n int
// Number of independent variables
k int
// Number of iterations
iter int
// learning rate
lr float64
//beta this is the vector of coefficients. (shape: k x 1)
beta *mat.VecDense
}
func NewMultipleRegression(x *mat.Dense, y *mat.VecDense, iter int, learning_rate float64) *MultipleRegression {
// x_obs is the number of observations
// x_vars is the number of features (including the bias)
x_obs, x_vars := x.Dims()
y_len := y.Len()
if y_len != x_obs {
panic("Vector y must be the same length as x.rows")
}
// Vector of parameters (k x 1)
beta_init := make([]float64, x_vars)
for i := range beta_init {
beta_init[i] = 0.0
}
// beta is the vector of parameters
beta := mat.NewVecDense(x_vars, beta_init)
return &MultipleRegression{
y, x, x_obs, x_vars, iter, learning_rate, beta,
}
}
Next we can implement the Train() method using gradient descent
func (mr *MultipleRegression) Train() {
// initialize y_hat. This is the vector of predicted values (n x 1)
y_hat := mat.NewVecDense(mr.n, make([]float64, mr.n))
// Initialize y_err. This is the error: diff between predicted and actual (n x 1)
y_err := mat.NewVecDense(mr.n, make([]float64, mr.n))
// initialize d_cost. This is derivative of the cost function
d_cost := mat.NewVecDense(mr.k, make([]float64, mr.k))
// Initialize step_size. This is the k x 1 Vector of step sizes: (k x 1)
step_size := mat.NewVecDense(mr.k, make([]float64, mr.k))
for i := 0; i < mr.iter; i++ {
// Step 1: Compute the predicted values for the model with parameters beta.
// Shape: n x 1
y_hat.MulVec(mr.x, mr.beta)
// Step 2: Calculate the error in the predictions. y_err = y_hat - y
// Shape: n x 1)
y_err.SubVec(y_hat, mr.y)
// Step 3: Calculate the first derivative of the cost function (the gradient)
d_cost.MulVec(mr.x.T(), y_err)
// Now scale it by 1/n: Shape: k x 1
d_cost.ScaleVec(1/(float64(mr.n)), d_cost)
// Step 4: Calculate the Step Size
step_size.ScaleVec(mr.lr, d_cost)
// Step 5: Calculate the new values of the model parameters
mr.beta.SubVec(mr.beta, step_size)
}
}
Training the model is a simple matter of loading the data and calling Train()
.
reader := data.NewSimpleCsvReader("testdata/train_data_processed.csv")
X_train, y_train, err := reader.LoadMatrices()
if err != nil {
log.Fatalf("Failed to load training data: %v", err)
}
iterations := 10000
learningRate := 0.000000005
mr := reg.NewMultipleRegression(X_train, y_train, iterations, learningRate)
mr.Train()
Once training is complete we can load test data and make predictions
//Now that training is complete lets load the test data
testReader := data.NewSimpleCsvReader("testdata/test_data_processed.csv")
X_test, y_test, err := testReader.LoadMatrices()
if err != nil {
log.Fatalf("Failed to load test data: %v", err)
}
predictions := mr.Predict(X_test)