跳到主要内容

Matrix and Vector Calculus

Matrix and vector calculus forms the mathematical foundation for optimal control theory. This document covers the essential differentiation rules and formulas needed for gradient-based optimization methods in control systems.

1. Derivatives of Scalar Functions by Vectors

Single Variable Case

For a scalar function of a single variable: f(u)=u22u1f(u) = u^2 - 2u - 1

where f,uRf, u \in \mathbb{R}.

The derivative is: df(u)du=2u2\frac{df(u)}{du} = 2u - 2

The extremum occurs when: df(u)duu=1=0\frac{df(u)}{du}\bigg|_{u=1} = 0

Multivariable Case

For a scalar function of two variables:

f(u)=u12+u22+2u1f(u) = u_1^2 + u_2^2 + 2u_1

To find the extremum:

{f(u1,u2)u1=0f(u1,u2)u2=0\begin{aligned} \begin{cases} \frac{\partial f(u_1,u_2)}{\partial u_1} = 0 \\ \frac{\partial f(u_1,u_2)}{\partial u_2} = 0 \end{cases} \end{aligned}

Vector Notation

Define vector u=[u1  u2]T\mathbf{u} = [u_1 \; u_2]^T, then:

f(u)u=[f(u)u1f(u)u2]=[00]\frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} = \begin{bmatrix} \frac{\partial f(\mathbf{u})}{\partial u_1} \\ \frac{\partial f(\mathbf{u})}{\partial u_2} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

General Vector Case

For a scalar function f(u)Rf(\mathbf{u}) \in \mathbb{R} of vector u=[u1    un]TRn\mathbf{u} = [u_1 \; \cdots \; u_n]^T \in \mathbb{R}^n:

Denominator Layout

f(u)u[f(u)u1f(u)un]\frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \triangleq \begin{bmatrix} \frac{\partial f(\mathbf{u})}{\partial u_1} \\ \vdots \\ \frac{\partial f(\mathbf{u})}{\partial u_n} \end{bmatrix}

where f(u)uRn\frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \in \mathbb{R}^n is a column vector.

Numerator Layout

f(u)u[f(u)u1f(u)un]\frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \triangleq \begin{bmatrix} \frac{\partial f(\mathbf{u})}{\partial u_1} & \cdots & \frac{\partial f(\mathbf{u})}{\partial u_n} \end{bmatrix}

where f(u)uR1×n\frac{\partial f(\mathbf{u})}{\partial \mathbf{u}} \in \mathbb{R}^{1 \times n} is a row vector.

Convention

In this document, we use the denominator layout convention, which is more common in control theory and optimization.

2. Derivatives of Vector Functions by Vectors

Vector Function by Scalar

For vector function f(u)=[f1(u)    fm(u)]TRmf(u) = [f_1(u) \; \cdots \; f_m(u)]^T \in \mathbb{R}^m of scalar uu:

f(u)u[f1(u)ufm(u)u]\frac{\partial f(u)}{\partial u} \triangleq \begin{bmatrix} \frac{\partial f_1(u)}{\partial u} & \cdots & \frac{\partial f_m(u)}{\partial u} \end{bmatrix}

where f(u)uR1×m\frac{\partial f(u)}{\partial u} \in \mathbb{R}^{1 \times m}.

Vector Function by Vector (Jacobian Matrix)

For vector function f(u)=[f1(u)    fm(u)]TRmf(u) = [f_1(u) \; \cdots \; f_m(u)]^T \in \mathbb{R}^m of vector u=[u1    un]TRnu = [u_1 \; \cdots \; u_n]^T \in \mathbb{R}^n:

f(u)u[f1(u)u1f2(u)u1fm(u)u1f1(u)u2f2(u)u2fm(u)u2f1(u)unf2(u)unfm(u)un]\frac{\partial f(u)}{\partial u} \triangleq \begin{bmatrix} \frac{\partial f_1(u)}{\partial u_1} & \frac{\partial f_2(u)}{\partial u_1} & \cdots & \frac{\partial f_m(u)}{\partial u_1} \\ \frac{\partial f_1(u)}{\partial u_2} & \frac{\partial f_2(u)}{\partial u_2} & \cdots & \frac{\partial f_m(u)}{\partial u_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_1(u)}{\partial u_n} & \frac{\partial f_2(u)}{\partial u_n} & \cdots & \frac{\partial f_m(u)}{\partial u_n} \end{bmatrix}

where f(u)uRn×m\frac{\partial f(u)}{\partial u} \in \mathbb{R}^{n \times m}.

This is called the Jacobian matrix.

Jacobian Properties
  • When using numerator layout: Jnumerator=JdenominatorTJ_{\text{numerator}} = J_{\text{denominator}}^T
  • The Jacobian generalizes the concept of derivative to vector-valued functions
  • Essential for Newton-Raphson methods and gradient-based optimization

3. Matrix Differentiation Formulas

Formula 1: Linear Form

(uTf)u=f\frac{\partial (u^T f)}{\partial u} = f

where u,fRnu, f \in \mathbb{R}^n.

Proof:

uTf=[u1un][f1fn]=f1u1++fnunu^T f = \begin{bmatrix} u_1 & \cdots & u_n \end{bmatrix} \begin{bmatrix} f_1 \\ \vdots \\ f_n \end{bmatrix} = f_1 u_1 + \cdots + f_n u_n (uTf)u=[(uTf)u1(uTf)un]=[f1fn]=f\frac{\partial (u^T f)}{\partial u} = \begin{bmatrix} \frac{\partial (u^T f)}{\partial u_1} \\ \vdots \\ \frac{\partial (u^T f)}{\partial u_n} \end{bmatrix} = \begin{bmatrix} f_1 \\ \vdots \\ f_n \end{bmatrix} = f

Formula 2: Matrix-Vector Product

(Au)u=AT\frac{\partial (Au)}{\partial u} = A^T

where uRnu \in \mathbb{R}^n, ARn×nA \in \mathbb{R}^{n \times n}.

Proof:

Au=[a11a12a1na21a22a2nan1an2ann][u1u2un]Au = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{bmatrix}

Let f=Auf = Au, then:

(Au)u=[f1u1f2u1fnu1f1u2f2u2fnu2f1unf2unfnun]=AT\frac{\partial (Au)}{\partial u} = \begin{bmatrix} \frac{\partial f_1}{\partial u_1} & \frac{\partial f_2}{\partial u_1} & \cdots & \frac{\partial f_n}{\partial u_1} \\ \frac{\partial f_1}{\partial u_2} & \frac{\partial f_2}{\partial u_2} & \cdots & \frac{\partial f_n}{\partial u_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial u_n} & \frac{\partial f_2}{\partial u_n} & \cdots & \frac{\partial f_n}{\partial u_n} \end{bmatrix} = A^T

Formula 3: Quadratic Form

(uTAu)u=Au+ATu\frac{\partial (u^T A u)}{\partial u} = Au + A^T u

where uRnu \in \mathbb{R}^n, ARn×nA \in \mathbb{R}^{n \times n}.

Special case: If A=ATA = A^T (symmetric), then:

(uTAu)u=2Au\frac{\partial (u^T A u)}{\partial u} = 2Au

Proof:

uTAu=i=1nj=1naijuiuju^T A u = \sum_{i=1}^n \sum_{j=1}^n a_{ij} u_i u_j

Taking the partial derivative with respect to uku_k:

(uTAu)uk=j=1nakjuj+i=1naikui=(Au)k+(ATu)k\frac{\partial (u^T A u)}{\partial u_k} = \sum_{j=1}^n a_{kj} u_j + \sum_{i=1}^n a_{ik} u_i = (Au)_k + (A^T u)_k

Therefore:

(uTAu)u=Au+ATu\frac{\partial (u^T A u)}{\partial u} = Au + A^T u

Formula 4: Second Derivative of Quadratic Form

2(uTAu)u2=A+AT\frac{\partial^2 (u^T A u)}{\partial u^2} = A + A^T

Special case: If A=ATA = A^T, then:

2(uTAu)u2=2A\frac{\partial^2 (u^T A u)}{\partial u^2} = 2A

This is the Hessian matrix of the quadratic form.

4. Chain Rule for Matrix Derivatives

General Chain Rule

For scalar function J=f(y(u))RJ = f(y(u)) \in \mathbb{R}, where y(u)Rmy(u) \in \mathbb{R}^m, uRnu \in \mathbb{R}^n:

Ju=yuJyRn\frac{\partial J}{\partial u} = \frac{\partial y}{\partial u} \frac{\partial J}{\partial y} \in \mathbb{R}^n

Note the order: yuRn×m\frac{\partial y}{\partial u} \in \mathbb{R}^{n \times m} and JyRm\frac{\partial J}{\partial y} \in \mathbb{R}^m.

Example Application

Ju=2ATBy\frac{\partial J}{\partial u} = 2A^T B y

where:

  • uRnu \in \mathbb{R}^n
  • ARm×nA \in \mathbb{R}^{m \times n}
  • y(u)=AuRmy(u) = Au \in \mathbb{R}^m
  • BRm×mB \in \mathbb{R}^{m \times m}
  • J=yTByRJ = y^T B y \in \mathbb{R}

Derivation:

Ju=yuJy=AT2By=2ATBy\frac{\partial J}{\partial u} = \frac{\partial y}{\partial u} \frac{\partial J}{\partial y} = A^T \cdot 2By = 2A^T B y

5. Derivatives of Scalar Functions by Matrices

For scalar function f(K)f(K) of matrix KRm×nK \in \mathbb{R}^{m \times n}:

f(K)K[f(K)k11f(K)k12f(K)k1nf(K)k21f(K)k22f(K)k2nf(K)km1f(K)km2f(K)kmn]\frac{\partial f(K)}{\partial K} \triangleq \begin{bmatrix} \frac{\partial f(K)}{\partial k_{11}} & \frac{\partial f(K)}{\partial k_{12}} & \cdots & \frac{\partial f(K)}{\partial k_{1n}} \\ \frac{\partial f(K)}{\partial k_{21}} & \frac{\partial f(K)}{\partial k_{22}} & \cdots & \frac{\partial f(K)}{\partial k_{2n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f(K)}{\partial k_{m1}} & \frac{\partial f(K)}{\partial k_{m2}} & \cdots & \frac{\partial f(K)}{\partial k_{mn}} \end{bmatrix}

Applications in Optimal Control

Linear Regression and Least Squares

The least squares problem:

minxAxb2=minx(Axb)T(Axb)\min_{x} \|Ax - b\|^2 = \min_{x} (Ax - b)^T(Ax - b)

Using our formulas:

x[(Axb)T(Axb)]=2AT(Axb)\frac{\partial}{\partial x}[(Ax - b)^T(Ax - b)] = 2A^T(Ax - b)

Setting to zero gives the normal equations:

ATAx=ATbA^T A x = A^T b

Gradient Descent

For cost function J(x)=12xTQx+cTxJ(x) = \frac{1}{2}x^T Q x + c^T x:

J(x)=Jx=Qx+c\nabla J(x) = \frac{\partial J}{\partial x} = Qx + c

Gradient descent update:

xk+1=xkαJ(xk)=xkα(Qxk+c)x_{k+1} = x_k - \alpha \nabla J(x_k) = x_k - \alpha(Qx_k + c)

LQR Cost Function

For the quadratic cost:

J=xNTSxN+k=0N1(xkTQxk+ukTRuk)J = x_N^T S x_N + \sum_{k=0}^{N-1} (x_k^T Q x_k + u_k^T R u_k)

The gradients are:

  • Jxk=2Qxk\frac{\partial J}{\partial x_k} = 2Qx_k (for k<Nk < N)
  • JxN=2SxN\frac{\partial J}{\partial x_N} = 2Sx_N
  • Juk=2Ruk\frac{\partial J}{\partial u_k} = 2Ru_k
Key Takeaways
  1. Denominator layout is standard in control theory
  2. Chain rule order matters: yuJy\frac{\partial y}{\partial u} \frac{\partial J}{\partial y}
  3. Symmetric matrices simplify quadratic form derivatives
  4. These formulas are essential for gradient-based optimization in optimal control

References

  1. Magnus, J. R., & Neudecker, H. (2019). Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons.
  2. Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
  3. Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.