Matrix and Vector Calculus
Matrix and vector calculus forms the mathematical foundation for optimal control theory. This document covers the essential differentiation rules and formulas needed for gradient-based optimization methods in control systems.
1. Derivatives of Scalar Functions by Vectors
Single Variable Case
For a scalar function of a single variable:
f ( u ) = u 2 − 2 u − 1 f(u) = u^2 - 2u - 1 f ( u ) = u 2 − 2 u − 1
where f , u ∈ R f, u \in \mathbb{R} f , u ∈ R .
The derivative is:
d f ( u ) d u = 2 u − 2 \frac{df(u)}{du} = 2u - 2 d u df ( u ) = 2 u − 2
The extremum occurs when:
d f ( u ) d u ∣ u = 1 = 0 \frac{df(u)}{du}\bigg|_{u=1} = 0 d u df ( u ) u = 1 = 0
Multivariable Case
For a scalar function of two variables:
f ( u ) = u 1 2 + u 2 2 + 2 u 1 f(u) = u_1^2 + u_2^2 + 2u_1 f ( u ) = u 1 2 + u 2 2 + 2 u 1
To find the extremum:
{ ∂ f ( u 1 , u 2 ) ∂ u 1 = 0 ∂ f ( u 1 , u 2 ) ∂ u 2 = 0 \begin{aligned}
\begin{cases}
\frac{\partial f(u_1,u_2)}{\partial u_1} = 0 \\
\frac{\partial f(u_1,u_2)}{\partial u_2} = 0
\end{cases}
\end{aligned} { ∂ u 1 ∂ f ( u 1 , u 2 ) = 0 ∂ u 2 ∂ f ( u 1 , u 2 ) = 0
Vector Notation
Define vector u = [ u 1 u 2 ] T \mathbf{u} = [u_1 \; u_2]^T u = [ u 1 u 2 ] T , then:
∂ f ( u ) ∂ u = [ ∂ f ( u ) ∂ u 1 ∂ f ( u ) ∂ u 2 ] = [ 0 0 ] \frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} = \begin{bmatrix}
\frac{\partial f(\mathbf{u})}{\partial u_1} \\
\frac{\partial f(\mathbf{u})}{\partial u_2}
\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} ∂ u ∂ f ( u ) = [ ∂ u 1 ∂ f ( u ) ∂ u 2 ∂ f ( u ) ] = [ 0 0 ]
General Vector Case
For a scalar function f ( u ) ∈ R f(\mathbf{u}) \in \mathbb{R} f ( u ) ∈ R of vector u = [ u 1 ⋯ u n ] T ∈ R n \mathbf{u} = [u_1 \; \cdots \; u_n]^T \in \mathbb{R}^n u = [ u 1 ⋯ u n ] T ∈ R n :
Denominator Layout
∂ f ( u ) ∂ u ≜ [ ∂ f ( u ) ∂ u 1 ⋮ ∂ f ( u ) ∂ u n ] \frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \triangleq \begin{bmatrix}
\frac{\partial f(\mathbf{u})}{\partial u_1} \\
\vdots \\
\frac{\partial f(\mathbf{u})}{\partial u_n}
\end{bmatrix} ∂ u ∂ f ( u ) ≜ ∂ u 1 ∂ f ( u ) ⋮ ∂ u n ∂ f ( u )
where ∂ f ( u ) ∂ u ∈ R n \frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \in \mathbb{R}^n ∂ u ∂ f ( u ) ∈ R n is a column vector.
Numerator Layout
∂ f ( u ) ∂ u ≜ [ ∂ f ( u ) ∂ u 1 ⋯ ∂ f ( u ) ∂ u n ] \frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \triangleq \begin{bmatrix}
\frac{\partial f(\mathbf{u})}{\partial u_1} & \cdots & \frac{\partial f(\mathbf{u})}{\partial u_n}
\end{bmatrix} ∂ u ∂ f ( u ) ≜ [ ∂ u 1 ∂ f ( u ) ⋯ ∂ u n ∂ f ( u ) ]
where ∂ f ( u ) ∂ u ∈ R 1 × n \frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \in \mathbb{R}^{1 \times n} ∂ u ∂ f ( u ) ∈ R 1 × n is a row vector.
In this document, we use the denominator layout convention, which is more common in control theory and optimization.
2. Derivatives of Vector Functions by Vectors
Vector Function by Scalar
For vector function f ( u ) = [ f 1 ( u ) ⋯ f m ( u ) ] T ∈ R m f(u) = [f_1(u) \; \cdots \; f_m(u)]^T \in \mathbb{R}^m f ( u ) = [ f 1 ( u ) ⋯ f m ( u ) ] T ∈ R m of scalar u u u :
∂ f ( u ) ∂ u ≜ [ ∂ f 1 ( u ) ∂ u ⋯ ∂ f m ( u ) ∂ u ] \frac{\partial f(u)}{\partial u} \triangleq \begin{bmatrix}
\frac{\partial f_1(u)}{\partial u} & \cdots & \frac{\partial f_m(u)}{\partial u}
\end{bmatrix} ∂ u ∂ f ( u ) ≜ [ ∂ u ∂ f 1 ( u ) ⋯ ∂ u ∂ f m ( u ) ]
where ∂ f ( u ) ∂ u ∈ R 1 × m \frac{\partial f(u)}{\partial u} \in \mathbb{R}^{1 \times m} ∂ u ∂ f ( u ) ∈ R 1 × m .
Vector Function by Vector (Jacobian Matrix)
For vector function f ( u ) = [ f 1 ( u ) ⋯ f m ( u ) ] T ∈ R m f(u) = [f_1(u) \; \cdots \; f_m(u)]^T \in \mathbb{R}^m f ( u ) = [ f 1 ( u ) ⋯ f m ( u ) ] T ∈ R m of vector u = [ u 1 ⋯ u n ] T ∈ R n u = [u_1 \; \cdots \; u_n]^T \in \mathbb{R}^n u = [ u 1 ⋯ u n ] T ∈ R n :
∂ f ( u ) ∂ u ≜ [ ∂ f 1 ( u ) ∂ u 1 ∂ f 2 ( u ) ∂ u 1 ⋯ ∂ f m ( u ) ∂ u 1 ∂ f 1 ( u ) ∂ u 2 ∂ f 2 ( u ) ∂ u 2 ⋯ ∂ f m ( u ) ∂ u 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ( u ) ∂ u n ∂ f 2 ( u ) ∂ u n ⋯ ∂ f m ( u ) ∂ u n ] \frac{\partial f(u)}{\partial u} \triangleq \begin{bmatrix}
\frac{\partial f_1(u)}{\partial u_1} & \frac{\partial f_2(u)}{\partial u_1} & \cdots & \frac{\partial f_m(u)}{\partial u_1} \\
\frac{\partial f_1(u)}{\partial u_2} & \frac{\partial f_2(u)}{\partial u_2} & \cdots & \frac{\partial f_m(u)}{\partial u_2} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f_1(u)}{\partial u_n} & \frac{\partial f_2(u)}{\partial u_n} & \cdots & \frac{\partial f_m(u)}{\partial u_n}
\end{bmatrix} ∂ u ∂ f ( u ) ≜ ∂ u 1 ∂ f 1 ( u ) ∂ u 2 ∂ f 1 ( u ) ⋮ ∂ u n ∂ f 1 ( u ) ∂ u 1 ∂ f 2 ( u ) ∂ u 2 ∂ f 2 ( u ) ⋮ ∂ u n ∂ f 2 ( u ) ⋯ ⋯ ⋱ ⋯ ∂ u 1 ∂ f m ( u ) ∂ u 2 ∂ f m ( u ) ⋮ ∂ u n ∂ f m ( u )
where ∂ f ( u ) ∂ u ∈ R n × m \frac{\partial f(u)}{\partial u} \in \mathbb{R}^{n \times m} ∂ u ∂ f ( u ) ∈ R n × m .
This is called the Jacobian matrix .
When using numerator layout: J numerator = J denominator T J_{\text{numerator}} = J_{\text{denominator}}^T J numerator = J denominator T
The Jacobian generalizes the concept of derivative to vector-valued functions
Essential for Newton-Raphson methods and gradient-based optimization
∂ ( u T f ) ∂ u = f \frac{\partial (u^T f)}{\partial u} = f ∂ u ∂ ( u T f ) = f
where u , f ∈ R n u, f \in \mathbb{R}^n u , f ∈ R n .
Proof:
u T f = [ u 1 ⋯ u n ] [ f 1 ⋮ f n ] = f 1 u 1 + ⋯ + f n u n u^T f = \begin{bmatrix} u_1 & \cdots & u_n \end{bmatrix} \begin{bmatrix} f_1 \\ \vdots \\ f_n \end{bmatrix} = f_1 u_1 + \cdots + f_n u_n u T f = [ u 1 ⋯ u n ] f 1 ⋮ f n = f 1 u 1 + ⋯ + f n u n
∂ ( u T f ) ∂ u = [ ∂ ( u T f ) ∂ u 1 ⋮ ∂ ( u T f ) ∂ u n ] = [ f 1 ⋮ f n ] = f \frac{\partial (u^T f)}{\partial u} = \begin{bmatrix}
\frac{\partial (u^T f)}{\partial u_1} \\
\vdots \\
\frac{\partial (u^T f)}{\partial u_n}
\end{bmatrix} = \begin{bmatrix} f_1 \\ \vdots \\ f_n \end{bmatrix} = f ∂ u ∂ ( u T f ) = ∂ u 1 ∂ ( u T f ) ⋮ ∂ u n ∂ ( u T f ) = f 1 ⋮ f n = f
∂ ( A u ) ∂ u = A T \frac{\partial (Au)}{\partial u} = A^T ∂ u ∂ ( A u ) = A T
where u ∈ R n u \in \mathbb{R}^n u ∈ R n , A ∈ R n × n A \in \mathbb{R}^{n \times n} A ∈ R n × n .
Proof:
A u = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a n 1 a n 2 ⋯ a n n ] [ u 1 u 2 ⋮ u n ] Au = \begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nn}
\end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{bmatrix} A u = a 11 a 21 ⋮ a n 1 a 12 a 22 ⋮ a n 2 ⋯ ⋯ ⋱ ⋯ a 1 n a 2 n ⋮ a nn u 1 u 2 ⋮ u n
Let f = A u f = Au f = A u , then:
∂ ( A u ) ∂ u = [ ∂ f 1 ∂ u 1 ∂ f 2 ∂ u 1 ⋯ ∂ f n ∂ u 1 ∂ f 1 ∂ u 2 ∂ f 2 ∂ u 2 ⋯ ∂ f n ∂ u 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ∂ u n ∂ f 2 ∂ u n ⋯ ∂ f n ∂ u n ] = A T \frac{\partial (Au)}{\partial u} = \begin{bmatrix}
\frac{\partial f_1}{\partial u_1} & \frac{\partial f_2}{\partial u_1} & \cdots & \frac{\partial f_n}{\partial u_1} \\
\frac{\partial f_1}{\partial u_2} & \frac{\partial f_2}{\partial u_2} & \cdots & \frac{\partial f_n}{\partial u_2} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f_1}{\partial u_n} & \frac{\partial f_2}{\partial u_n} & \cdots & \frac{\partial f_n}{\partial u_n}
\end{bmatrix} = A^T ∂ u ∂ ( A u ) = ∂ u 1 ∂ f 1 ∂ u 2 ∂ f 1 ⋮ ∂ u n ∂ f 1 ∂ u 1 ∂ f 2 ∂ u 2 ∂ f 2 ⋮ ∂ u n ∂ f 2 ⋯ ⋯ ⋱ ⋯ ∂ u 1 ∂ f n ∂ u 2 ∂ f n ⋮ ∂ u n ∂ f n = A T
∂ ( u T A u ) ∂ u = A u + A T u \frac{\partial (u^T A u)}{\partial u} = Au + A^T u ∂ u ∂ ( u T A u ) = A u + A T u
where u ∈ R n u \in \mathbb{R}^n u ∈ R n , A ∈ R n × n A \in \mathbb{R}^{n \times n} A ∈ R n × n .
Special case: If A = A T A = A^T A = A T (symmetric), then:
∂ ( u T A u ) ∂ u = 2 A u \frac{\partial (u^T A u)}{\partial u} = 2Au ∂ u ∂ ( u T A u ) = 2 A u
Proof:
u T A u = ∑ i = 1 n ∑ j = 1 n a i j u i u j u^T A u = \sum_{i=1}^n \sum_{j=1}^n a_{ij} u_i u_j u T A u = i = 1 ∑ n j = 1 ∑ n a ij u i u j
Taking the partial derivative with respect to u k u_k u k :
∂ ( u T A u ) ∂ u k = ∑ j = 1 n a k j u j + ∑ i = 1 n a i k u i = ( A u ) k + ( A T u ) k \frac{\partial (u^T A u)}{\partial u_k} = \sum_{j=1}^n a_{kj} u_j + \sum_{i=1}^n a_{ik} u_i = (Au)_k + (A^T u)_k ∂ u k ∂ ( u T A u ) = j = 1 ∑ n a kj u j + i = 1 ∑ n a ik u i = ( A u ) k + ( A T u ) k
Therefore:
∂ ( u T A u ) ∂ u = A u + A T u \frac{\partial (u^T A u)}{\partial u} = Au + A^T u ∂ u ∂ ( u T A u ) = A u + A T u
∂ 2 ( u T A u ) ∂ u 2 = A + A T \frac{\partial^2 (u^T A u)}{\partial u^2} = A + A^T ∂ u 2 ∂ 2 ( u T A u ) = A + A T
Special case: If A = A T A = A^T A = A T , then:
∂ 2 ( u T A u ) ∂ u 2 = 2 A \frac{\partial^2 (u^T A u)}{\partial u^2} = 2A ∂ u 2 ∂ 2 ( u T A u ) = 2 A
This is the Hessian matrix of the quadratic form.
4. Chain Rule for Matrix Derivatives
General Chain Rule
For scalar function J = f ( y ( u ) ) ∈ R J = f(y(u)) \in \mathbb{R} J = f ( y ( u )) ∈ R , where y ( u ) ∈ R m y(u) \in \mathbb{R}^m y ( u ) ∈ R m , u ∈ R n u \in \mathbb{R}^n u ∈ R n :
∂ J ∂ u = ∂ y ∂ u ∂ J ∂ y ∈ R n \frac{\partial J}{\partial u} = \frac{\partial y}{\partial u} \frac{\partial J}{\partial y} \in \mathbb{R}^n ∂ u ∂ J = ∂ u ∂ y ∂ y ∂ J ∈ R n
Note the order: ∂ y ∂ u ∈ R n × m \frac{\partial y}{\partial u} \in \mathbb{R}^{n \times m} ∂ u ∂ y ∈ R n × m and ∂ J ∂ y ∈ R m \frac{\partial J}{\partial y} \in \mathbb{R}^m ∂ y ∂ J ∈ R m .
Example Application
∂ J ∂ u = 2 A T B y \frac{\partial J}{\partial u} = 2A^T B y ∂ u ∂ J = 2 A T B y
where:
u ∈ R n u \in \mathbb{R}^n u ∈ R n
A ∈ R m × n A \in \mathbb{R}^{m \times n} A ∈ R m × n
y ( u ) = A u ∈ R m y(u) = Au \in \mathbb{R}^m y ( u ) = A u ∈ R m
B ∈ R m × m B \in \mathbb{R}^{m \times m} B ∈ R m × m
J = y T B y ∈ R J = y^T B y \in \mathbb{R} J = y T B y ∈ R
Derivation:
∂ J ∂ u = ∂ y ∂ u ∂ J ∂ y = A T ⋅ 2 B y = 2 A T B y \frac{\partial J}{\partial u} = \frac{\partial y}{\partial u} \frac{\partial J}{\partial y} = A^T \cdot 2By = 2A^T B y ∂ u ∂ J = ∂ u ∂ y ∂ y ∂ J = A T ⋅ 2 B y = 2 A T B y
5. Derivatives of Scalar Functions by Matrices
For scalar function f ( K ) f(K) f ( K ) of matrix K ∈ R m × n K \in \mathbb{R}^{m \times n} K ∈ R m × n :
∂ f ( K ) ∂ K ≜ [ ∂ f ( K ) ∂ k 11 ∂ f ( K ) ∂ k 12 ⋯ ∂ f ( K ) ∂ k 1 n ∂ f ( K ) ∂ k 21 ∂ f ( K ) ∂ k 22 ⋯ ∂ f ( K ) ∂ k 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ( K ) ∂ k m 1 ∂ f ( K ) ∂ k m 2 ⋯ ∂ f ( K ) ∂ k m n ] \frac{\partial f(K)}{\partial K} \triangleq \begin{bmatrix}
\frac{\partial f(K)}{\partial k_{11}} & \frac{\partial f(K)}{\partial k_{12}} & \cdots & \frac{\partial f(K)}{\partial k_{1n}} \\
\frac{\partial f(K)}{\partial k_{21}} & \frac{\partial f(K)}{\partial k_{22}} & \cdots & \frac{\partial f(K)}{\partial k_{2n}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f(K)}{\partial k_{m1}} & \frac{\partial f(K)}{\partial k_{m2}} & \cdots & \frac{\partial f(K)}{\partial k_{mn}}
\end{bmatrix} ∂ K ∂ f ( K ) ≜ ∂ k 11 ∂ f ( K ) ∂ k 21 ∂ f ( K ) ⋮ ∂ k m 1 ∂ f ( K ) ∂ k 12 ∂ f ( K ) ∂ k 22 ∂ f ( K ) ⋮ ∂ k m 2 ∂ f ( K ) ⋯ ⋯ ⋱ ⋯ ∂ k 1 n ∂ f ( K ) ∂ k 2 n ∂ f ( K ) ⋮ ∂ k mn ∂ f ( K )
Applications in Optimal Control
Linear Regression and Least Squares
The least squares problem:
min x ∥ A x − b ∥ 2 = min x ( A x − b ) T ( A x − b ) \min_{x} \|Ax - b\|^2 = \min_{x} (Ax - b)^T(Ax - b) x min ∥ A x − b ∥ 2 = x min ( A x − b ) T ( A x − b )
Using our formulas:
∂ ∂ x [ ( A x − b ) T ( A x − b ) ] = 2 A T ( A x − b ) \frac{\partial}{\partial x}[(Ax - b)^T(Ax - b)] = 2A^T(Ax - b) ∂ x ∂ [( A x − b ) T ( A x − b )] = 2 A T ( A x − b )
Setting to zero gives the normal equations:
A T A x = A T b A^T A x = A^T b A T A x = A T b
Gradient Descent
For cost function J ( x ) = 1 2 x T Q x + c T x J(x) = \frac{1}{2}x^T Q x + c^T x J ( x ) = 2 1 x T Q x + c T x :
∇ J ( x ) = ∂ J ∂ x = Q x + c \nabla J(x) = \frac{\partial J}{\partial x} = Qx + c ∇ J ( x ) = ∂ x ∂ J = Q x + c
Gradient descent update:
x k + 1 = x k − α ∇ J ( x k ) = x k − α ( Q x k + c ) x_{k+1} = x_k - \alpha \nabla J(x_k) = x_k - \alpha(Qx_k + c) x k + 1 = x k − α ∇ J ( x k ) = x k − α ( Q x k + c )
LQR Cost Function
For the quadratic cost:
J = x N T S x N + ∑ k = 0 N − 1 ( x k T Q x k + u k T R u k ) J = x_N^T S x_N + \sum_{k=0}^{N-1} (x_k^T Q x_k + u_k^T R u_k) J = x N T S x N + k = 0 ∑ N − 1 ( x k T Q x k + u k T R u k )
The gradients are:
∂ J ∂ x k = 2 Q x k \frac{\partial J}{\partial x_k} = 2Qx_k ∂ x k ∂ J = 2 Q x k (for k < N k < N k < N )
∂ J ∂ x N = 2 S x N \frac{\partial J}{\partial x_N} = 2Sx_N ∂ x N ∂ J = 2 S x N
∂ J ∂ u k = 2 R u k \frac{\partial J}{\partial u_k} = 2Ru_k ∂ u k ∂ J = 2 R u k
Denominator layout is standard in control theory
Chain rule order matters: ∂ y ∂ u ∂ J ∂ y \frac{\partial y}{\partial u} \frac{\partial J}{\partial y} ∂ u ∂ y ∂ y ∂ J
Symmetric matrices simplify quadratic form derivatives
These formulas are essential for gradient-based optimization in optimal control
References
Magnus, J. R., & Neudecker, H. (2019). Matrix Differential Calculus with Applications in Statistics and Econometrics . John Wiley & Sons.
Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook . Technical University of Denmark.
Boyd, S., & Vandenberghe, L. (2004). Convex Optimization . Cambridge University Press.