Multiple Regression in Go

Posted by Rahul Singh on September 28th, 2019 Tagged AI

Linear Regression predicts one dependent variable (y) given one or more independent variables (x₁, x₂, x₃...x_n). Simple Linear regression with a single independent variable is easy to implement thanks to convenient closed form solution.

Multiple regression also has a closed form solution using matrices and requires computing a matrix inverse which is often impossible^[1].

The Closed form Solution

The closed form of multiple regression solves k partial differential equations using matrices and requires a matrix inverse. Lets begin with the matrix representation of a linear model in k dimensions with n observations in the training data:

\[\hat{Y} = X\beta\] $\hat{Y}$ is a n x 1 column vector
$X$ is a n x k matrix
$\beta$ is a k x 1 column vector

With the matrix representation, the closed form for multiple regression is

\[\beta = (X^TX)^{-1} X^TY\]

The proof of this expression is instructive but ultimately unhelpful because it requires a matrix inverse. The matrix inverse is undefined for non-square matrices. It is also undefined for square matrices where the determinant is zero.

In practice the optimal solution for Multiple Regression is computed using Gradient Descent.

Gradient Descent algorithm for Multiple Regression

The gradient descent algorithm begins by initializing all model coefficients to zero. It then iteratively calculates the partial derivatives of the cost function with respect to each coefficient, determining the direction and magnitude of the steepest increase in cost. The coefficients are updated at each iteration by moving in the opposite direction of these gradients, scaled by a small learning rate parameter that controls the step size. This iterative optimization process repeats, gradually minimizing the cost function, until either a predetermined number of iterations is reached or the magnitude of the gradient falls below a specified threshold, indicating convergence to a local minimum.

The cost function for Multiple Regression is

\[\frac{1}{2n}{\parallel Y - X\beta \parallel} ^2\]

The first derivative of the cost function is

\[ \frac{1}{n} X^T (\hat{Y}-Y)\] \[ \frac{1}{n} X^T (X\beta-Y)\]

We can now write the steps in pseudo code as follows:

// Start with initial values for the model coefficients
// beta is a k x 1 column vector
beta = 0 

for i in 0..iterations:
   // Step 1: compute the predicted values for the model with parameters beta
   // Y_hat is an n x 1 column vector of predictions
   // X is a n x k matrix of feature values
   // beta is k x 1 column vector
   Y_hat = X x beta
   
   // Step 2: Calculate the error in the predictions
   // Y is an n x 1 column vector of observations
   Y_err = Y_hat - Y
   
   // Step 3: Calculate the first derivative of the cost function (the gradient)
   d_cost = (1/n)matmul(X_transpose, Y_err)
   
   //Step 4: Calculate the step size
   step_size = learning_rate * d_cost
   
   //Step 5: Calculate the new values for the model parameters
   beta = beta - step_size

With that clarity the go implementation for multiple regression using gradient descent is easy.

Multiple Regression in Go

We'll use the gonum library for matrix and vector operations and create a struct - MultipleRegression - and implement two methods on it: Train() and Predict(). Train() will be used to train the regression model and Predict() will be used to make predictions once training is complete.

Lets begin by adding the gonum package:

go get gonum.org/v1/gonum

Next lets create a struct named MultipleRegression and add a constructor

// Multiple Regression using Gradient Descent
//
// n observations
// k parameters (k-1 features)
//
// y_hat = Vector of Predicted values from the model (shape: n x 1)
// beta = Vector of Parameters                       (shape: k x 1)
// y = Vector of observed values                     (shape: n x 1)
// x = Matrix of Independent Variables               (shape: n x k)
// iter = number of iterations of Gradient Descent
// lr = Learning rate
type MultipleRegression struct {
	// Column Vector of observed values (shape: n x 1)
	y *mat.VecDense
	// Matrix of k independent variables and n observations (shape: n x k)
	x *mat.Dense
	// Number of observations
	n int
	// Number of independent variables
	k int
	// Number of iterations
	iter int
	// learning rate
	lr float64
	//beta this is the vector of coefficients. (shape: k x 1)
	beta *mat.VecDense
}

func NewMultipleRegression(x *mat.Dense, y *mat.VecDense, iter int, learning_rate float64) *MultipleRegression {
	// x_obs is the number of observations
	// x_vars is the number of features (including the bias)
	x_obs, x_vars := x.Dims()
	y_len := y.Len()

	if y_len != x_obs {
		panic("Vector y must be the same length as x.rows")
	}

	// Vector of parameters (k x 1)
	beta_init := make([]float64, x_vars)
	for i := range beta_init {
		beta_init[i] = 0.0
	}

	// beta is the vector of parameters
	beta := mat.NewVecDense(x_vars, beta_init)

	return &MultipleRegression{
		y, x, x_obs, x_vars, iter, learning_rate, beta,
	}
}

Next we can implement the Train() method using gradient descent

func (mr *MultipleRegression) Train() {

	// initialize y_hat. This is the vector of predicted values (n x 1)
	y_hat := mat.NewVecDense(mr.n, make([]float64, mr.n))
	// Initialize y_err. This is the error: diff between predicted and actual (n x 1)
	y_err := mat.NewVecDense(mr.n, make([]float64, mr.n))
	// initialize d_cost. This is derivative of the cost function
	d_cost := mat.NewVecDense(mr.k, make([]float64, mr.k))
	// Initialize step_size. This is the k x 1 Vector of step sizes: (k x 1)
	step_size := mat.NewVecDense(mr.k, make([]float64, mr.k))

	for i := 0; i < mr.iter; i++ {
		// Step 1: Compute the predicted values for the model with parameters beta.
		// Shape: n x 1
		y_hat.MulVec(mr.x, mr.beta)

		// Step 2: Calculate the error in the predictions. y_err = y_hat - y
		// Shape: n x 1)
		y_err.SubVec(y_hat, mr.y)

		// Step 3: Calculate the first derivative of the cost function (the gradient)
		d_cost.MulVec(mr.x.T(), y_err)
		// Now scale it by 1/n: Shape: k x 1
		d_cost.ScaleVec(1/(float64(mr.n)), d_cost)

		// Step 4: Calculate the Step Size
		step_size.ScaleVec(mr.lr, d_cost)

		// Step 5: Calculate the new values of the model parameters
		mr.beta.SubVec(mr.beta, step_size)
	}
}

Training the model is a simple matter of loading the data and calling Train().

	reader := data.NewSimpleCsvReader("testdata/train_data_processed.csv")
	X_train, y_train, err := reader.LoadMatrices()
	if err != nil {
		log.Fatalf("Failed to load training data: %v", err)
	}

	iterations := 10000
	learningRate := 0.000000005
	mr := reg.NewMultipleRegression(X_train, y_train, iterations, learningRate)
	mr.Train()

Once training is complete we can load test data and make predictions

//Now that training is complete lets load the test data
	testReader := data.NewSimpleCsvReader("testdata/test_data_processed.csv")
	X_test, y_test, err := testReader.LoadMatrices()
	if err != nil {
		log.Fatalf("Failed to load test data: %v", err)
	}

	predictions := mr.Predict(X_test)

[1]:A Matrix is invertible only if the determinant is non-zero. See Inverse of a Matrix in my tutorial on Vectors and Matrices

arrsingh.com

Distributed Systems & Artificial Intelligence

Multiple Regression in Go

Sign up for updates