Dynamic Programming for Continuous-Time Systems (HJB Equation)

This document derives the Hamilton-Jacobi-Bellman (HJB) equation, which is the continuous-time analog of the discrete-time Bellman equation. The HJB equation provides the foundation for solving optimal control problems in continuous time.

Continuous-Time System Formulation

System Dynamics

Consider a general continuous-time system:

\dot{x}(t) = f(x(t), u(t), t)

Where:

$x(t) \in \mathbb{R}^n$ : system state at time $t$
$u(t) \in \mathbb{R}^m$ : control input at time $t$
$f(\cdot, \cdot, \cdot)$ : system dynamics (potentially time-varying and nonlinear)

Performance Function

The general performance function from time $t$ to terminal time $t_f$ is:

$J_{t \to t_f}(x(t), t, u(\cdot)) = h(x(t_f), t_f) + \int_t^{t_f} g(x(r), u(r), r) \, dr$

Where:

$h(x(t_f), t_f)$ : terminal cost function
$g(x(r), u(r), r)$ : running cost function (integrand)
$u(\cdot)$ : control function over the interval $[t, t_f]$

Control Task

Find the optimal control function $u^*(r)$ for $r \in [t, t_f]$ that minimizes:

$J^*_{t \to t_f}(x(t), t) = \min_{u(\cdot)} \left\{ h(x(t_f), t_f) + \int_t^{t_f} g(x(r), u(r), r) \, dr \right\}$

Derivation of the HJB Equation

Step 1: Time Interval Decomposition

Following the principle of dynamic programming, divide the time interval $[t, t_f]$ into two parts:

$[t, t + \Delta t]$ : immediate interval
$[t + \Delta t, t_f]$ : remaining interval

The performance function becomes:

\begin{aligned} J^*_{t \to t_f}(x(t), t) &= \min_{u(\cdot)} \left\{ h(x(t_f), t_f) + \int_{t + \Delta t}^{t_f} g(x(r), u(r), r) \, dr \right. \\ &\quad \left. + \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr \right\} \end{aligned}

Step 2: Apply Bellman's Principle

The first integral represents the cost-to-go from $t + \Delta t$ to $t_f$ : $h(x(t_f), t_f) + \int_{t + \Delta t}^{t_f} g(x(r), u(r), r) \, dr = J_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t, u(\cdot))$

By Bellman's optimality principle, this must be optimal:

\begin{aligned} J^*_{t \to t_f}(x(t), t) &= \min_{u(\cdot)} \left\{ J^*_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t) \right. \\ &\quad \left. + \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr \right\} \end{aligned}

Step 3: Taylor Series Expansion

Assuming the value function is twice continuously differentiable, expand around $(x(t), t)$ :

\begin{aligned} &J^*_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t) \\ &= J^*_{t \to t_f}(x(t), t) + \frac{\partial J^*}{\partial t} \Delta t + \left(\frac{\partial J^*}{\partial x}\right)^T (x(t + \Delta t) - x(t)) + \text{H.O.T.} \end{aligned}

Where H.O.T. represents higher-order terms.

Step 4: Take Limits

As $\Delta t \to 0$ :

State change: $\lim_{\Delta t \to 0} (x(t + \Delta t) - x(t)) = \dot{x}(t) \Delta t = f(x(t), u(t), t) \Delta t$

Integral approximation: $\lim_{\Delta t \to 0} \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr = g(x(t), u(t), t) \Delta t$

Notation simplification: $\frac{\partial J^*_{t \to t_f}(x(t), t)}{\partial t} \triangleq J_t^*(x(t), t)$ $\frac{\partial J^*_{t \to t_f}(x(t), t)}{\partial x} \triangleq J_x^*(x(t), t)$

Step 5: Substitute and Simplify

Substituting into the Bellman equation:

\begin{aligned} J^*_{t \to t_f}(x(t), t) &= \min_{u(t)} \left\{ J^*_{t \to t_f}(x(t), t) + J_t^*(x(t), t) \Delta t \right. \\ &\quad \left. + (J_x^*(x(t), t))^T f(x(t), u(t), t) \Delta t + g(x(t), u(t), t) \Delta t \right\} \end{aligned}

Canceling $J^*_{t \to t_f}(x(t), t)$ from both sides and dividing by $\Delta t$ :

$0 = J_t^*(x(t), t) + \min_{u(t)} \left\{ (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t) \right\}$

Hamilton-Jacobi-Bellman Equation

HJB Equation

The Hamilton-Jacobi-Bellman equation is:

\begin{cases} 0 = J_t^*(x(t), t) + \min_{u(t)} \left\{ (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t) \right\} \\ J^*(x(t_f), t_f) = h(x(t_f), t_f) \end{cases}

This is a partial differential equation (PDE) with boundary condition at the terminal time.

Hamiltonian Function

Define the Hamiltonian: $\mathcal{H}(x(t), u(t), J_x^*, t) \triangleq (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t)$

The optimal control satisfies: $u^*(x(t), t) = \arg\min_{u(t)} \mathcal{H}(x(t), u(t), J_x^*, t)$

Compact HJB Form

The HJB equation can be written compactly as: $0 = J_t^*(x(t), t) + \mathcal{H}(x(t), u^*(x(t), t), J_x^*(x(t), t), t)$

Where: $\mathcal{H}(x(t), u^*, J_x^*, t) = \min_{u(t)} \mathcal{H}(x(t), u(t), J_x^*, t)$

Interpretation and Properties

1. Optimality Conditions

For the optimal control, the Hamiltonian is minimized: $\frac{\partial \mathcal{H}}{\partial u}\bigg|_{u=u^*} = 0$

This gives: $\frac{\partial f}{\partial u}\bigg|_{u=u^*} (J_x^*)^T + \frac{\partial g}{\partial u}\bigg|_{u=u^*} = 0$

2. Costate Interpretation

The gradient $J_x^*$ can be interpreted as the costate or adjoint variable: $\lambda(t) = J_x^*(x(t), t)$

The costate represents the sensitivity of the optimal cost with respect to state perturbations.

3. Connection to Pontryagin's Maximum Principle

The HJB equation is closely related to Pontryagin's Maximum Principle (PMP):

HJB: Sufficient conditions for optimality (when solution exists)
PMP: Necessary conditions for optimality

Examples

Example 1: Linear Quadratic Regulator (LQR)

System: $\dot{x} = Ax + Bu$

Cost: $J = \frac{1}{2}x(t_f)^T S x(t_f) + \frac{1}{2}\int_0^{t_f} (x^T Q x + u^T R u) dt$

Hamiltonian: $\mathcal{H} = \frac{1}{2}(x^T Q x + u^T R u) + \lambda^T (Ax + Bu)$

Optimal control: $u^* = -R^{-1} B^T \lambda$

Value function: $J^*(x, t) = \frac{1}{2} x^T P(t) x$

Riccati equation: $\dot{P} + PA + A^T P - PBR^{-1}B^T P + Q = 0$

Example 2: Minimum Time Problem

System: $\dot{x} = f(x, u)$

Cost: $J = \int_0^{t_f} 1 \, dt = t_f$

Running cost: $g(x, u, t) = 1$

Hamiltonian: $\mathcal{H} = 1 + \lambda^T f(x, u)$

HJB equation: $0 = J_t^* + \min_u \{1 + (J_x^*)^T f(x, u)\}$

Solution Methods

1. Analytical Solutions

For certain special cases (e.g., LQR), analytical solutions exist:

Linear systems with quadratic costs
Certain nonlinear systems with specific structures

2. Numerical Methods

For general nonlinear problems:

Finite difference methods: Discretize the PDE
Semi-Lagrangian methods: Follow characteristic curves
Level set methods: For optimal control with constraints
Deep learning approaches: Neural network approximations

3. Approximate Solutions

Linearization: Around nominal trajectory
Perturbation methods: For small nonlinearities
Successive approximation: Iterative improvement

Advantages and Limitations

Advantages

Global optimality: Provides globally optimal solution (when it exists)
Feedback control: Results in state-feedback controller
Theoretical foundation: Fundamental principle of optimal control
Handles constraints: Can incorporate state-dependent constraints

Limitations

Curse of dimensionality: PDE complexity grows exponentially with state dimension
Smoothness requirements: Requires differentiable value function
Boundary conditions: Difficult to specify for general problems
Computational complexity: Numerical solution is challenging

Connection to Other Methods

1. Discrete-Time Dynamic Programming

The HJB equation is the continuous-time limit of the discrete-time Bellman equation: $J_k^*(x_k) = \min_{u_k} [J_{k+1}^*(f(x_k, u_k)) + g(x_k, u_k)]$

2. Calculus of Variations

For unconstrained problems, the HJB equation reduces to the Euler-Lagrange equation from calculus of variations.

3. Model Predictive Control

MPC can be viewed as repeatedly solving finite-horizon HJB equations in a receding horizon framework.

References

Optimal Control by DR_CAN
Wang, T. (2023). 控制之美 (卷2). Tsinghua University Press.
Bellman, R. (1957). Dynamic Programming. Princeton University Press.
Fleming, W. H., & Rishel, R. W. (1975). Deterministic and Stochastic Optimal Control. Springer.
Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control (3rd ed.). Athena Scientific.
Lewis, F. L., Vrabie, D., & Syrmos, V. L. (2012). Optimal Control (3rd ed.). John Wiley & Sons.

Continuous-Time System Formulation​

System Dynamics​

Performance Function​

Control Task​

Derivation of the HJB Equation​

Step 1: Time Interval Decomposition​

Step 2: Apply Bellman's Principle​

Step 3: Taylor Series Expansion​

Step 4: Take Limits​

Step 5: Substitute and Simplify​

Hamilton-Jacobi-Bellman Equation​

HJB Equation​

Hamiltonian Function​

Compact HJB Form​

Interpretation and Properties​

1. Optimality Conditions​

2. Costate Interpretation​

3. Connection to Pontryagin's Maximum Principle​

Examples​

Example 1: Linear Quadratic Regulator (LQR)​

Example 2: Minimum Time Problem​

Solution Methods​

1. Analytical Solutions​

2. Numerical Methods​

3. Approximate Solutions​

Advantages and Limitations​

Advantages​

Limitations​

Connection to Other Methods​

1. Discrete-Time Dynamic Programming​

2. Calculus of Variations​

3. Model Predictive Control​

References​