01-17
- model
- h_theta = theta^{T} x
- h_theta = X theta
- input data:
- X = [
1 x_1^(1) x_2(1) … x_n(1)
1 x_1^(i) x_2(i) … x_n(i)
1 x_1^(m) x_2(m) … x_n(m)
]
- subscript: feature element
- superscript: training example
- m x (n + 1)
- cost function: J(theta) = 1 / 2m * sum_1^{m} (h_theta(x^(i)) - y^(i))^{2}
- 1 / 2m (X * theta - y)^{T}(X * theta - y)
- gradient of a quadratic function
- f(x) = 1/2x^{T}Qx - b^{T}x
- x = [x1 … xn],
- Q e R^{n x n}
- gradient_x f(x) = Qx - b
- 1/2m gradient_theta(theta^{T}X^{T}Xtheta - 2theta^{T}X^{T}y + y^{T}y)
- pseudocode
- X: m by (m + 1)
- y: m by 1
- theta: (n + 1) by 1
- randomly initialize
- hyperparameters:
- alpha: learning rate
- number of iterations
for i in range(iterations):
compute cost function
- store the value for every iteration & plot
compute gradient
update theta
theta = theta - alpha * gradient
- add normalization