{"id":979,"date":"2026-06-02T07:13:33","date_gmt":"2026-06-02T07:13:33","guid":{"rendered":"https:\/\/tensorzen.online\/?p=979"},"modified":"2026-06-02T07:14:19","modified_gmt":"2026-06-02T07:14:19","slug":"derivation-of-the-least-squares-solution","status":"publish","type":"post","link":"https:\/\/tensorzen.online\/?p=979","title":{"rendered":"Derivation of the Least Squares Solution"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">This notebook derives the ordinary least squares (OLS) solution $\\hat{x} = (A^\\top A)^{-1} A^\\top b$ from first principles using matrix algebra.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Problem Setup<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We want to solve the linear system:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$A x = b$$ <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where: <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$A \\in \\mathbb{R}^{m \\times n}$ is a known matrix (typically $m > n$, more equations than unknowns)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$x \\in \\mathbb{R}^n$ is the unknown vector we want to find<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$b \\in \\mathbb{R}^m$ is a known vector<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When $m > n$, this system is <strong>overdetermined<\/strong> \u2014 there are more constraints than degrees of freedom, so an exact solution generally does not exist. Instead, we seek the $\\hat{x}$ that comes closest to satisfying all equations simultaneously.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. Defining the Residual<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For any candidate solution $x$, define the <strong>residual<\/strong>:<br>$$r = b &#8211; Ax$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This vector measures how far $Ax$ is from $b$. We want to minimize the <strong>sum of squared residuals<\/strong>, i.e., the squared Euclidean norm of $r$:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$\\min_{x} \\; |r|^2 = \\min_{x} \\; |b &#8211; Ax|^2$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Expanding using the inner product definition $|v|^2 = v^\\top v$:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$|b &#8211; Ax|^2 = (b &#8211; Ax)^\\top (b &#8211; Ax)$$<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. Expanding the Objective<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Expand $(b &#8211; Ax)^\\top(b &#8211; Ax)$:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$<br>|b &#8211; Ax|^2 = b^\\top b &#8211; b^\\top A x &#8211; x^\\top A^\\top b + x^\\top A^\\top A x<br>$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Since $b^\\top A x$ is a scalar, it equals its own transpose:<br>$$b^\\top A x = (b^\\top A x)^\\top = x^\\top A^\\top b$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So the two middle terms are equal, and we can write:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$\\boxed{|b &#8211; Ax|^2 = b^\\top b &#8211; 2x^\\top A^\\top b + x^\\top A^\\top A x}$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Call this objective function $f(x)$. It is a <strong>quadratic function<\/strong> of $x$, and since $A^\\top A$ is positive semi-definite, $f(x)$ is convex \u2014 any local minimum is a global minimum.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. Taking the Gradient<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To minimize $f(x)$, we take the gradient with respect to $x$ and set it to zero.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Using standard matrix calculus identities:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>$\\nabla_x (c^\\top x) = c$<\/li>\n\n\n\n<li>$\\nabla_x (x^\\top C x) = 2Cx$ when $C$ is symmetric<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Note that $A^\\top A$ is symmetric by construction: $(A^\\top A)^\\top = A^\\top A$.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Applying these identities to $f(x)$:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$\\nabla_x f(x) = 0 &#8211; 2A^\\top b + 2A^\\top A x$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Setting $\\nabla_x f(x) = 0$:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$2A^\\top A x &#8211; 2A^\\top b = 0$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$A^\\top A x = A^\\top b$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These are called the <strong>Normal Equations<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">5. The Normal Equations<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">$$A^\\top A \\hat{x} = A^\\top b$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Geometric interpretation:<\/strong> The normal equations enforce that the residual $r = b &#8211; A\\hat{x}$ is orthogonal to every column of $A$:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$A^\\top (b &#8211; A\\hat{x}) = 0 \\iff A^\\top r = 0$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In other words, $\\hat{x}$ is the solution for which $A\\hat{x}$ is the <strong>orthogonal projection<\/strong> of $b$ onto the column space of $A$. The residual points in a direction that $A$ cannot represent \u2014 it is orthogonal to $\\text{col}(A)$.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">6. Solving for $\\hat{x}$<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If $A$ has <strong>full column rank<\/strong> (rank $= n$), then $A^\\top A$ is symmetric positive definite and therefore invertible. We can solve the normal equations directly:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$\\hat{x} = (A^\\top A)^{-1} A^\\top b$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is the <strong>Ordinary Least Squares (OLS) estimator<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The matrix $A^\\dagger = (A^\\top A)^{-1} A^\\top$ is called the <strong>Moore-Penrose pseudoinverse<\/strong> of $A$ (in the full column rank case).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">7. The Projection Matrix<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The fitted values $\\hat{b} = A\\hat{x}$ are the projection of $b$ onto $\\text{col}(A)$:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$\\hat{b} = A(A^\\top A)^{-1} A^\\top b = P b$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where $P = A(A^\\top A)^{-1} A^\\top$ is the <strong>orthogonal projection matrix<\/strong> (also called the hat matrix).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$P$ has two key properties that confirm it is a valid orthogonal projector:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Property<\/th><th>Statement<\/th><th>Meaning<\/th><\/tr><\/thead><tbody><tr><td>Idempotent<\/td><td>$P^2 = P$<\/td><td>Projecting twice is the same as projecting once<\/td><\/tr><tr><td>Symmetric<\/td><td>$P^\\top = P$<\/td><td>Projection is orthogonal (not oblique)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification of idempotency:<\/strong><br>$$P^2 = A(A^\\top A)^{-1} A^\\top \\cdot A(A^\\top A)^{-1} A^\\top = A(A^\\top A)^{-1} (A^\\top A)(A^\\top A)^{-1} A^\\top = A(A^\\top A)^{-1} A^\\top = P \\checkmark$$<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">8. Confirming the Minimum (Second-Order Condition)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The Hessian of $f(x)$ with respect to $x$ is:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$H = \\nabla^2_x f(x) = 2A^\\top A$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For any nonzero vector $v$:<br>$$v^\\top (A^\\top A) v = (Av)^\\top (Av) = |Av|^2 \\geq 0$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So $A^\\top A$ is positive semi-definite, and strictly positive definite when $A$ has full column rank (since $Av = 0 \\Rightarrow v = 0$). Therefore $H \\succ 0$, confirming that the stationary point $\\hat{x}$ is a strict global minimum.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Step<\/th><th>Key result<\/th><\/tr><\/thead><tbody><tr><td>Objective<\/td><td>Minimize $|b &#8211; Ax|^2$<\/td><\/tr><tr><td>Expand<\/td><td>$f(x) = b^\\top b &#8211; 2x^\\top A^\\top b + x^\\top A^\\top A x$<\/td><\/tr><tr><td>Gradient<\/td><td>$\\nabla_x f = 2A^\\top A x &#8211; 2A^\\top b$<\/td><\/tr><tr><td>Normal equations<\/td><td>$A^\\top A \\hat{x} = A^\\top b$<\/td><\/tr><tr><td>OLS solution<\/td><td>$\\hat{x} = (A^\\top A)^{-1} A^\\top b$ (when $A$ is full column rank)<\/td><\/tr><tr><td>Geometric meaning<\/td><td>$A\\hat{x}$ is the orthogonal projection of $b$ onto $\\text{col}(A)$<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The full column rank condition on $A$ is the precise requirement that makes the least squares solution unique \u2014 consistent with the earlier discussion of full rank implications.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This notebook derives the ordinary least squares (OLS) solution $\\hat{x} = (A^\\top A)^{-1} A^\\top b$ from first principles using matrix algebra. 1. Problem Setup We want to solve [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,25],"tags":[24],"class_list":["post-979","post","type-post","status-publish","format-standard","hentry","category-base","category-in-english","tag-machine-learning"],"_links":{"self":[{"href":"https:\/\/tensorzen.online\/index.php?rest_route=\/wp\/v2\/posts\/979","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tensorzen.online\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tensorzen.online\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tensorzen.online\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tensorzen.online\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=979"}],"version-history":[{"count":7,"href":"https:\/\/tensorzen.online\/index.php?rest_route=\/wp\/v2\/posts\/979\/revisions"}],"predecessor-version":[{"id":987,"href":"https:\/\/tensorzen.online\/index.php?rest_route=\/wp\/v2\/posts\/979\/revisions\/987"}],"wp:attachment":[{"href":"https:\/\/tensorzen.online\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=979"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tensorzen.online\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=979"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tensorzen.online\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=979"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}