Gradient Descent is often implemented in two different ways. The first is via a nested loop, with an outer loop controlling an update counter, and an inner loop computing the error gradients. The second way is essentially the same as the first, except optimized second loop libraries are available if you can express your error gradients in the form of a vector equation.
First, some background — what’s an error gradient?
In gradient descent, you create something called a cost function. This cost function is some computation of average error for the entire data set, then you multiply this real number by the column vector from your data of your x values, multiple this by a guess called alpha, then update the value of your guess for the co-factor you’re solving for.
Here’s how you do that in Matlab.
First, assume I have this equation and data matching it already made up:
y = t0x0 + t1x1. x0 =1, y and x1 are measured and given. We’re solving for t0, t1.
Here’s the equation with some data for the point (0, 1).
1 = t0*1 + t1*0.
Let’s say the data for all points x,y is loaded into two matching vectors of this form
y = load(‘y_vector.txt’);
x = load(‘x_vector.txt’);
How do we create the needed matrix/vectors needed to run GD in Matlab?
first, let’s make our t vector — we have 2 unknowns we need to solve for — we can set them to any random number. We’ll use 0.
t = [0; 0];
Now, let’s see how many data pairs we have:
m = length(y)
Now, let’s create our X matrix:
X = [ones(m,1), x];
Now, assume we already have a gradient descent function called GD written that can handle iteration for us with this prototype: t = GD ( X, y, t, alpha, iterations ) — let’s solve for the vector t.
t = GD( X, y, t, 0.01, 1000); % does 1,000 iterations with an alpha guess of 0.01
Here’s the update rules for the GD function:
predictions = X * t; % X is a m x 2 matrix, and t is a 2 x 1 vector. The result is a m x 1 vector that has the predicted y values from the linear equation.
temp1 = t(1) – (alpha/m * (predictions-y)’ *X(:,1)); %(predictions – y) is a m x 1 vector of errors between the linear equation and the actual data point’s y value. using the ‘ operator transposes the vector to a 1 x m vector. This multiplies by the m x 1 vector X(:, 1) ( which is the m x 1 vector of all ones ). A 1 x m vector times a m x 1 vector results in a single real value, which is the partial derivative, numerically, of the square average error projected onto the axis of X1.
temp2 = t(2) – (alpha/m * (predictions-y)’ *X(:,2)); % same as above, except X(:, 2) is the mx1 vector of data values for x in our original point data set.
% do the simultaneous update
t(1) = temp1;
t(2) = temp2;
with this, you have vectorized the computation of GD in matlab!