Why is it that a step along the direction of the gradient results in the highest increase of a function?
\[ D_\mathbf{u} f(\mathbf{x}) = \lim_{\alpha \to 0} \frac{f(\mathbf{x} + \alpha \mathbf{u}) - f(\mathbf{x})}{\alpha} \]
\(D_\mathbf{u} f(\mathbf{x})\) is the directional derivarive of the function \(f\) at the point \(\mathbf{x}\) along a direction \(\mathbf{u}\).
Since \(\mathbf{u}\) is purely used here to denote direction, we set \(||\mathbf{u}||^2 = 1\).
So we are essentially trying to find a \(\mathbf{u}\) such that \(D_\mathbf{u} f(\mathbf{x})\) is maximized.
Letβs evaluate the limit
Since we are setting \(\alpha \to 0\), we can consider the linear approximation of \(f\) around \(\mathbf{x}\) in our limit:
\[ f(\mathbf{\tilde{x}}) \approx f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot (\mathbf{x} - \mathbf{\tilde{x}}) \]
\[ \begin{align*} D_\mathbf{u} f(\mathbf{x}) &= \lim_{\alpha \to 0} \frac{f(\mathbf{x} + \alpha \mathbf{u}) - f(\mathbf{x})}{\alpha} \\ &= \lim_{\alpha \to 0} \frac{f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot (\alpha \mathbf{u}) - f(\mathbf{x})}{\alpha} \\ &= \lim_{\alpha \to 0} \frac{\alpha \times \nabla f(\mathbf{x}) \cdot \mathbf{u}}{\alpha} \\ &= \nabla f(\mathbf{x}) \cdot \mathbf{u} \end{align*} \]
Hence the directional-derivative is essentially the dot product of the gradient vector \(\nabla f(\mathbf{x})\) with the direction vector \(\mathbf{u}\)
Now, what is the range of \(\nabla f(\mathbf{x}) \cdot \mathbf{u}\)? (remember, we wanted to maximize this)
\[ (\mathbf{a} \cdot \mathbf{b})^2 \le ||\mathbf{a}||^2||\mathbf{b}||^2 \]
\[ \implies -||\mathbf{a}||\times||\mathbf{b}|| \le (\mathbf{a} \cdot \mathbf{b}) \le ||\mathbf{a}||\times||\mathbf{b}|| \]
\[ -||\nabla f(\mathbf{x})|| \le \nabla f(\mathbf{x}) \cdot \mathbf{u} \le ||\nabla f(\mathbf{x})|| \]
Why is \(\operatorname{trace}(ABC)\) = \(\operatorname{trace}(BCA)\) = \(\operatorname{trace}(CAB)\) ? (Cyclic Permutations)
Why is the trace equal to the sum of eigenvalues?
Why is \(\underset{{w \in \mathbb{R}^n}}{\max} w^T C w = \lambda_1\), where \(\lambda_1\) is the largest eigenvalue of the matrix \(C\)?