Skip to main content

Dynamic Programming for Continuous-Time Systems (HJB Equation)

This document derives the Hamilton-Jacobi-Bellman (HJB) equation, which is the continuous-time analog of the discrete-time Bellman equation. The HJB equation provides the foundation for solving optimal control problems in continuous time.

Continuous-Time System Formulation

System Dynamics

Consider a general continuous-time system:

x˙(t)=f(x(t),u(t),t)\dot{x}(t) = f(x(t), u(t), t)

Where:

  • x(t)Rnx(t) \in \mathbb{R}^n: system state at time tt
  • u(t)Rmu(t) \in \mathbb{R}^m: control input at time tt
  • f(,,)f(\cdot, \cdot, \cdot): system dynamics (potentially time-varying and nonlinear)

Performance Function

The general performance function from time tt to terminal time tft_f is:

Jttf(x(t),t,u())=h(x(tf),tf)+ttfg(x(r),u(r),r)drJ_{t \to t_f}(x(t), t, u(\cdot)) = h(x(t_f), t_f) + \int_t^{t_f} g(x(r), u(r), r) \, dr

Where:

  • h(x(tf),tf)h(x(t_f), t_f): terminal cost function
  • g(x(r),u(r),r)g(x(r), u(r), r): running cost function (integrand)
  • u()u(\cdot): control function over the interval [t,tf][t, t_f]

Control Task

Find the optimal control function u(r)u^*(r) for r[t,tf]r \in [t, t_f] that minimizes:

Jttf(x(t),t)=minu(){h(x(tf),tf)+ttfg(x(r),u(r),r)dr}J^*_{t \to t_f}(x(t), t) = \min_{u(\cdot)} \left\{ h(x(t_f), t_f) + \int_t^{t_f} g(x(r), u(r), r) \, dr \right\}

Derivation of the HJB Equation

Step 1: Time Interval Decomposition

Following the principle of dynamic programming, divide the time interval [t,tf][t, t_f] into two parts:

  • [t,t+Δt][t, t + \Delta t]: immediate interval
  • [t+Δt,tf][t + \Delta t, t_f]: remaining interval

The performance function becomes:

Jttf(x(t),t)=minu(){h(x(tf),tf)+t+Δttfg(x(r),u(r),r)dr+tt+Δtg(x(r),u(r),r)dr}\begin{aligned} J^*_{t \to t_f}(x(t), t) &= \min_{u(\cdot)} \left\{ h(x(t_f), t_f) + \int_{t + \Delta t}^{t_f} g(x(r), u(r), r) \, dr \right. \\ &\quad \left. + \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr \right\} \end{aligned}

Step 2: Apply Bellman's Principle

The first integral represents the cost-to-go from t+Δtt + \Delta t to tft_f: h(x(tf),tf)+t+Δttfg(x(r),u(r),r)dr=Jt+Δttf(x(t+Δt),t+Δt,u())h(x(t_f), t_f) + \int_{t + \Delta t}^{t_f} g(x(r), u(r), r) \, dr = J_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t, u(\cdot))

By Bellman's optimality principle, this must be optimal:

Jttf(x(t),t)=minu(){Jt+Δttf(x(t+Δt),t+Δt)+tt+Δtg(x(r),u(r),r)dr}\begin{aligned} J^*_{t \to t_f}(x(t), t) &= \min_{u(\cdot)} \left\{ J^*_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t) \right. \\ &\quad \left. + \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr \right\} \end{aligned}

Step 3: Taylor Series Expansion

Assuming the value function is twice continuously differentiable, expand around (x(t),t)(x(t), t):

Jt+Δttf(x(t+Δt),t+Δt)=Jttf(x(t),t)+JtΔt+(Jx)T(x(t+Δt)x(t))+H.O.T.\begin{aligned} &J^*_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t) \\ &= J^*_{t \to t_f}(x(t), t) + \frac{\partial J^*}{\partial t} \Delta t + \left(\frac{\partial J^*}{\partial x}\right)^T (x(t + \Delta t) - x(t)) + \text{H.O.T.} \end{aligned}

Where H.O.T. represents higher-order terms.

Step 4: Take Limits

As Δt0\Delta t \to 0:

State change: limΔt0(x(t+Δt)x(t))=x˙(t)Δt=f(x(t),u(t),t)Δt\lim_{\Delta t \to 0} (x(t + \Delta t) - x(t)) = \dot{x}(t) \Delta t = f(x(t), u(t), t) \Delta t

Integral approximation: limΔt0tt+Δtg(x(r),u(r),r)dr=g(x(t),u(t),t)Δt\lim_{\Delta t \to 0} \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr = g(x(t), u(t), t) \Delta t

Notation simplification: Jttf(x(t),t)tJt(x(t),t)\frac{\partial J^*_{t \to t_f}(x(t), t)}{\partial t} \triangleq J_t^*(x(t), t) Jttf(x(t),t)xJx(x(t),t)\frac{\partial J^*_{t \to t_f}(x(t), t)}{\partial x} \triangleq J_x^*(x(t), t)

Step 5: Substitute and Simplify

Substituting into the Bellman equation:

Jttf(x(t),t)=minu(t){Jttf(x(t),t)+Jt(x(t),t)Δt+(Jx(x(t),t))Tf(x(t),u(t),t)Δt+g(x(t),u(t),t)Δt}\begin{aligned} J^*_{t \to t_f}(x(t), t) &= \min_{u(t)} \left\{ J^*_{t \to t_f}(x(t), t) + J_t^*(x(t), t) \Delta t \right. \\ &\quad \left. + (J_x^*(x(t), t))^T f(x(t), u(t), t) \Delta t + g(x(t), u(t), t) \Delta t \right\} \end{aligned}

Canceling Jttf(x(t),t)J^*_{t \to t_f}(x(t), t) from both sides and dividing by Δt\Delta t:

0=Jt(x(t),t)+minu(t){(Jx(x(t),t))Tf(x(t),u(t),t)+g(x(t),u(t),t)}0 = J_t^*(x(t), t) + \min_{u(t)} \left\{ (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t) \right\}

Hamilton-Jacobi-Bellman Equation

HJB Equation

The Hamilton-Jacobi-Bellman equation is:

{0=Jt(x(t),t)+minu(t){(Jx(x(t),t))Tf(x(t),u(t),t)+g(x(t),u(t),t)}J(x(tf),tf)=h(x(tf),tf)\begin{cases} 0 = J_t^*(x(t), t) + \min_{u(t)} \left\{ (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t) \right\} \\ J^*(x(t_f), t_f) = h(x(t_f), t_f) \end{cases}

This is a partial differential equation (PDE) with boundary condition at the terminal time.

Hamiltonian Function

Define the Hamiltonian: H(x(t),u(t),Jx,t)(Jx(x(t),t))Tf(x(t),u(t),t)+g(x(t),u(t),t)\mathcal{H}(x(t), u(t), J_x^*, t) \triangleq (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t)

The optimal control satisfies: u(x(t),t)=argminu(t)H(x(t),u(t),Jx,t)u^*(x(t), t) = \arg\min_{u(t)} \mathcal{H}(x(t), u(t), J_x^*, t)

Compact HJB Form

The HJB equation can be written compactly as: 0=Jt(x(t),t)+H(x(t),u(x(t),t),Jx(x(t),t),t)0 = J_t^*(x(t), t) + \mathcal{H}(x(t), u^*(x(t), t), J_x^*(x(t), t), t)

Where: H(x(t),u,Jx,t)=minu(t)H(x(t),u(t),Jx,t)\mathcal{H}(x(t), u^*, J_x^*, t) = \min_{u(t)} \mathcal{H}(x(t), u(t), J_x^*, t)

Interpretation and Properties

1. Optimality Conditions

For the optimal control, the Hamiltonian is minimized: Huu=u=0\frac{\partial \mathcal{H}}{\partial u}\bigg|_{u=u^*} = 0

This gives: fuu=u(Jx)T+guu=u=0\frac{\partial f}{\partial u}\bigg|_{u=u^*} (J_x^*)^T + \frac{\partial g}{\partial u}\bigg|_{u=u^*} = 0

2. Costate Interpretation

The gradient JxJ_x^* can be interpreted as the costate or adjoint variable: λ(t)=Jx(x(t),t)\lambda(t) = J_x^*(x(t), t)

The costate represents the sensitivity of the optimal cost with respect to state perturbations.

3. Connection to Pontryagin's Maximum Principle

The HJB equation is closely related to Pontryagin's Maximum Principle (PMP):

  • HJB: Sufficient conditions for optimality (when solution exists)
  • PMP: Necessary conditions for optimality

Examples

Example 1: Linear Quadratic Regulator (LQR)

System: x˙=Ax+Bu\dot{x} = Ax + Bu

Cost: J=12x(tf)TSx(tf)+120tf(xTQx+uTRu)dtJ = \frac{1}{2}x(t_f)^T S x(t_f) + \frac{1}{2}\int_0^{t_f} (x^T Q x + u^T R u) dt

Hamiltonian: H=12(xTQx+uTRu)+λT(Ax+Bu)\mathcal{H} = \frac{1}{2}(x^T Q x + u^T R u) + \lambda^T (Ax + Bu)

Optimal control: u=R1BTλu^* = -R^{-1} B^T \lambda

Value function: J(x,t)=12xTP(t)xJ^*(x, t) = \frac{1}{2} x^T P(t) x

Riccati equation: P˙+PA+ATPPBR1BTP+Q=0\dot{P} + PA + A^T P - PBR^{-1}B^T P + Q = 0

Example 2: Minimum Time Problem

System: x˙=f(x,u)\dot{x} = f(x, u)

Cost: J=0tf1dt=tfJ = \int_0^{t_f} 1 \, dt = t_f

Running cost: g(x,u,t)=1g(x, u, t) = 1

Hamiltonian: H=1+λTf(x,u)\mathcal{H} = 1 + \lambda^T f(x, u)

HJB equation: 0=Jt+minu{1+(Jx)Tf(x,u)}0 = J_t^* + \min_u \{1 + (J_x^*)^T f(x, u)\}

Solution Methods

1. Analytical Solutions

For certain special cases (e.g., LQR), analytical solutions exist:

  • Linear systems with quadratic costs
  • Certain nonlinear systems with specific structures

2. Numerical Methods

For general nonlinear problems:

  • Finite difference methods: Discretize the PDE
  • Semi-Lagrangian methods: Follow characteristic curves
  • Level set methods: For optimal control with constraints
  • Deep learning approaches: Neural network approximations

3. Approximate Solutions

  • Linearization: Around nominal trajectory
  • Perturbation methods: For small nonlinearities
  • Successive approximation: Iterative improvement

Advantages and Limitations

Advantages

  1. Global optimality: Provides globally optimal solution (when it exists)
  2. Feedback control: Results in state-feedback controller
  3. Theoretical foundation: Fundamental principle of optimal control
  4. Handles constraints: Can incorporate state-dependent constraints

Limitations

  1. Curse of dimensionality: PDE complexity grows exponentially with state dimension
  2. Smoothness requirements: Requires differentiable value function
  3. Boundary conditions: Difficult to specify for general problems
  4. Computational complexity: Numerical solution is challenging

Connection to Other Methods

1. Discrete-Time Dynamic Programming

The HJB equation is the continuous-time limit of the discrete-time Bellman equation: Jk(xk)=minuk[Jk+1(f(xk,uk))+g(xk,uk)]J_k^*(x_k) = \min_{u_k} [J_{k+1}^*(f(x_k, u_k)) + g(x_k, u_k)]

2. Calculus of Variations

For unconstrained problems, the HJB equation reduces to the Euler-Lagrange equation from calculus of variations.

3. Model Predictive Control

MPC can be viewed as repeatedly solving finite-horizon HJB equations in a receding horizon framework.

References

  1. Optimal Control by DR_CAN
  2. Wang, T. (2023). 控制之美 (卷2). Tsinghua University Press.
  3. Bellman, R. (1957). Dynamic Programming. Princeton University Press.
  4. Fleming, W. H., & Rishel, R. W. (1975). Deterministic and Stochastic Optimal Control. Springer.
  5. Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control (3rd ed.). Athena Scientific.
  6. Lewis, F. L., Vrabie, D., & Syrmos, V. L. (2012). Optimal Control (3rd ed.). John Wiley & Sons.