Dynamic Programming for Continuous-Time Systems (HJB Equation)
This document derives the Hamilton-Jacobi-Bellman (HJB) equation, which is the continuous-time analog of the discrete-time Bellman equation. The HJB equation provides the foundation for solving optimal control problems in continuous time.
System Dynamics
Consider a general continuous-time system:
x ˙ ( t ) = f ( x ( t ) , u ( t ) , t ) \dot{x}(t) = f(x(t), u(t), t) x ˙ ( t ) = f ( x ( t ) , u ( t ) , t )
Where:
x ( t ) ∈ R n x(t) \in \mathbb{R}^n x ( t ) ∈ R n : system state at time t t t
u ( t ) ∈ R m u(t) \in \mathbb{R}^m u ( t ) ∈ R m : control input at time t t t
f ( ⋅ , ⋅ , ⋅ ) f(\cdot, \cdot, \cdot) f ( ⋅ , ⋅ , ⋅ ) : system dynamics (potentially time-varying and nonlinear)
The general performance function from time t t t to terminal time t f t_f t f is:
J t → t f ( x ( t ) , t , u ( ⋅ ) ) = h ( x ( t f ) , t f ) + ∫ t t f g ( x ( r ) , u ( r ) , r ) d r J_{t \to t_f}(x(t), t, u(\cdot)) = h(x(t_f), t_f) + \int_t^{t_f} g(x(r), u(r), r) \, dr J t → t f ( x ( t ) , t , u ( ⋅ )) = h ( x ( t f ) , t f ) + ∫ t t f g ( x ( r ) , u ( r ) , r ) d r
Where:
h ( x ( t f ) , t f ) h(x(t_f), t_f) h ( x ( t f ) , t f ) : terminal cost function
g ( x ( r ) , u ( r ) , r ) g(x(r), u(r), r) g ( x ( r ) , u ( r ) , r ) : running cost function (integrand)
u ( ⋅ ) u(\cdot) u ( ⋅ ) : control function over the interval [ t , t f ] [t, t_f] [ t , t f ]
Control Task
Find the optimal control function u ∗ ( r ) u^*(r) u ∗ ( r ) for r ∈ [ t , t f ] r \in [t, t_f] r ∈ [ t , t f ] that minimizes:
J t → t f ∗ ( x ( t ) , t ) = min u ( ⋅ ) { h ( x ( t f ) , t f ) + ∫ t t f g ( x ( r ) , u ( r ) , r ) d r } J^*_{t \to t_f}(x(t), t) = \min_{u(\cdot)} \left\{ h(x(t_f), t_f) + \int_t^{t_f} g(x(r), u(r), r) \, dr \right\} J t → t f ∗ ( x ( t ) , t ) = min u ( ⋅ ) { h ( x ( t f ) , t f ) + ∫ t t f g ( x ( r ) , u ( r ) , r ) d r }
Derivation of the HJB Equation
Step 1: Time Interval Decomposition
Following the principle of dynamic programming, divide the time interval [ t , t f ] [t, t_f] [ t , t f ] into two parts:
[ t , t + Δ t ] [t, t + \Delta t] [ t , t + Δ t ] : immediate interval
[ t + Δ t , t f ] [t + \Delta t, t_f] [ t + Δ t , t f ] : remaining interval
The performance function becomes:
J t → t f ∗ ( x ( t ) , t ) = min u ( ⋅ ) { h ( x ( t f ) , t f ) + ∫ t + Δ t t f g ( x ( r ) , u ( r ) , r ) d r + ∫ t t + Δ t g ( x ( r ) , u ( r ) , r ) d r } \begin{aligned}
J^*_{t \to t_f}(x(t), t) &= \min_{u(\cdot)} \left\{ h(x(t_f), t_f) + \int_{t + \Delta t}^{t_f} g(x(r), u(r), r) \, dr \right. \\
&\quad \left. + \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr \right\}
\end{aligned} J t → t f ∗ ( x ( t ) , t ) = u ( ⋅ ) min { h ( x ( t f ) , t f ) + ∫ t + Δ t t f g ( x ( r ) , u ( r ) , r ) d r + ∫ t t + Δ t g ( x ( r ) , u ( r ) , r ) d r }
Step 2: Apply Bellman's Principle
The first integral represents the cost-to-go from t + Δ t t + \Delta t t + Δ t to t f t_f t f :
h ( x ( t f ) , t f ) + ∫ t + Δ t t f g ( x ( r ) , u ( r ) , r ) d r = J t + Δ t → t f ( x ( t + Δ t ) , t + Δ t , u ( ⋅ ) ) h(x(t_f), t_f) + \int_{t + \Delta t}^{t_f} g(x(r), u(r), r) \, dr = J_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t, u(\cdot)) h ( x ( t f ) , t f ) + ∫ t + Δ t t f g ( x ( r ) , u ( r ) , r ) d r = J t + Δ t → t f ( x ( t + Δ t ) , t + Δ t , u ( ⋅ ))
By Bellman's optimality principle, this must be optimal:
J t → t f ∗ ( x ( t ) , t ) = min u ( ⋅ ) { J t + Δ t → t f ∗ ( x ( t + Δ t ) , t + Δ t ) + ∫ t t + Δ t g ( x ( r ) , u ( r ) , r ) d r } \begin{aligned}
J^*_{t \to t_f}(x(t), t) &= \min_{u(\cdot)} \left\{ J^*_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t) \right. \\
&\quad \left. + \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr \right\}
\end{aligned} J t → t f ∗ ( x ( t ) , t ) = u ( ⋅ ) min { J t + Δ t → t f ∗ ( x ( t + Δ t ) , t + Δ t ) + ∫ t t + Δ t g ( x ( r ) , u ( r ) , r ) d r }
Step 3: Taylor Series Expansion
Assuming the value function is twice continuously differentiable, expand around ( x ( t ) , t ) (x(t), t) ( x ( t ) , t ) :
J t + Δ t → t f ∗ ( x ( t + Δ t ) , t + Δ t ) = J t → t f ∗ ( x ( t ) , t ) + ∂ J ∗ ∂ t Δ t + ( ∂ J ∗ ∂ x ) T ( x ( t + Δ t ) − x ( t ) ) + H.O.T. \begin{aligned}
&J^*_{t + \Delta t \to t_f}(x(t + \Delta t), t + \Delta t) \\
&= J^*_{t \to t_f}(x(t), t) + \frac{\partial J^*}{\partial t} \Delta t + \left(\frac{\partial J^*}{\partial x}\right)^T (x(t + \Delta t) - x(t)) + \text{H.O.T.}
\end{aligned} J t + Δ t → t f ∗ ( x ( t + Δ t ) , t + Δ t ) = J t → t f ∗ ( x ( t ) , t ) + ∂ t ∂ J ∗ Δ t + ( ∂ x ∂ J ∗ ) T ( x ( t + Δ t ) − x ( t )) + H.O.T.
Where H.O.T. represents higher-order terms.
Step 4: Take Limits
As Δ t → 0 \Delta t \to 0 Δ t → 0 :
State change:
lim Δ t → 0 ( x ( t + Δ t ) − x ( t ) ) = x ˙ ( t ) Δ t = f ( x ( t ) , u ( t ) , t ) Δ t \lim_{\Delta t \to 0} (x(t + \Delta t) - x(t)) = \dot{x}(t) \Delta t = f(x(t), u(t), t) \Delta t lim Δ t → 0 ( x ( t + Δ t ) − x ( t )) = x ˙ ( t ) Δ t = f ( x ( t ) , u ( t ) , t ) Δ t
Integral approximation:
lim Δ t → 0 ∫ t t + Δ t g ( x ( r ) , u ( r ) , r ) d r = g ( x ( t ) , u ( t ) , t ) Δ t \lim_{\Delta t \to 0} \int_t^{t + \Delta t} g(x(r), u(r), r) \, dr = g(x(t), u(t), t) \Delta t lim Δ t → 0 ∫ t t + Δ t g ( x ( r ) , u ( r ) , r ) d r = g ( x ( t ) , u ( t ) , t ) Δ t
Notation simplification:
∂ J t → t f ∗ ( x ( t ) , t ) ∂ t ≜ J t ∗ ( x ( t ) , t ) \frac{\partial J^*_{t \to t_f}(x(t), t)}{\partial t} \triangleq J_t^*(x(t), t) ∂ t ∂ J t → t f ∗ ( x ( t ) , t ) ≜ J t ∗ ( x ( t ) , t )
∂ J t → t f ∗ ( x ( t ) , t ) ∂ x ≜ J x ∗ ( x ( t ) , t ) \frac{\partial J^*_{t \to t_f}(x(t), t)}{\partial x} \triangleq J_x^*(x(t), t) ∂ x ∂ J t → t f ∗ ( x ( t ) , t ) ≜ J x ∗ ( x ( t ) , t )
Step 5: Substitute and Simplify
Substituting into the Bellman equation:
J t → t f ∗ ( x ( t ) , t ) = min u ( t ) { J t → t f ∗ ( x ( t ) , t ) + J t ∗ ( x ( t ) , t ) Δ t + ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) Δ t + g ( x ( t ) , u ( t ) , t ) Δ t } \begin{aligned}
J^*_{t \to t_f}(x(t), t) &= \min_{u(t)} \left\{ J^*_{t \to t_f}(x(t), t) + J_t^*(x(t), t) \Delta t \right. \\
&\quad \left. + (J_x^*(x(t), t))^T f(x(t), u(t), t) \Delta t + g(x(t), u(t), t) \Delta t \right\}
\end{aligned} J t → t f ∗ ( x ( t ) , t ) = u ( t ) min { J t → t f ∗ ( x ( t ) , t ) + J t ∗ ( x ( t ) , t ) Δ t + ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) Δ t + g ( x ( t ) , u ( t ) , t ) Δ t }
Canceling J t → t f ∗ ( x ( t ) , t ) J^*_{t \to t_f}(x(t), t) J t → t f ∗ ( x ( t ) , t ) from both sides and dividing by Δ t \Delta t Δ t :
0 = J t ∗ ( x ( t ) , t ) + min u ( t ) { ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) + g ( x ( t ) , u ( t ) , t ) } 0 = J_t^*(x(t), t) + \min_{u(t)} \left\{ (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t) \right\} 0 = J t ∗ ( x ( t ) , t ) + min u ( t ) { ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) + g ( x ( t ) , u ( t ) , t ) }
Hamilton-Jacobi-Bellman Equation
HJB Equation
The Hamilton-Jacobi-Bellman equation is:
{ 0 = J t ∗ ( x ( t ) , t ) + min u ( t ) { ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) + g ( x ( t ) , u ( t ) , t ) } J ∗ ( x ( t f ) , t f ) = h ( x ( t f ) , t f ) \begin{cases}
0 = J_t^*(x(t), t) + \min_{u(t)} \left\{ (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t) \right\} \\
J^*(x(t_f), t_f) = h(x(t_f), t_f)
\end{cases} { 0 = J t ∗ ( x ( t ) , t ) + min u ( t ) { ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) + g ( x ( t ) , u ( t ) , t ) } J ∗ ( x ( t f ) , t f ) = h ( x ( t f ) , t f )
This is a partial differential equation (PDE) with boundary condition at the terminal time.
Hamiltonian Function
Define the Hamiltonian :
H ( x ( t ) , u ( t ) , J x ∗ , t ) ≜ ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) + g ( x ( t ) , u ( t ) , t ) \mathcal{H}(x(t), u(t), J_x^*, t) \triangleq (J_x^*(x(t), t))^T f(x(t), u(t), t) + g(x(t), u(t), t) H ( x ( t ) , u ( t ) , J x ∗ , t ) ≜ ( J x ∗ ( x ( t ) , t ) ) T f ( x ( t ) , u ( t ) , t ) + g ( x ( t ) , u ( t ) , t )
The optimal control satisfies:
u ∗ ( x ( t ) , t ) = arg min u ( t ) H ( x ( t ) , u ( t ) , J x ∗ , t ) u^*(x(t), t) = \arg\min_{u(t)} \mathcal{H}(x(t), u(t), J_x^*, t) u ∗ ( x ( t ) , t ) = arg min u ( t ) H ( x ( t ) , u ( t ) , J x ∗ , t )
The HJB equation can be written compactly as:
0 = J t ∗ ( x ( t ) , t ) + H ( x ( t ) , u ∗ ( x ( t ) , t ) , J x ∗ ( x ( t ) , t ) , t ) 0 = J_t^*(x(t), t) + \mathcal{H}(x(t), u^*(x(t), t), J_x^*(x(t), t), t) 0 = J t ∗ ( x ( t ) , t ) + H ( x ( t ) , u ∗ ( x ( t ) , t ) , J x ∗ ( x ( t ) , t ) , t )
Where:
H ( x ( t ) , u ∗ , J x ∗ , t ) = min u ( t ) H ( x ( t ) , u ( t ) , J x ∗ , t ) \mathcal{H}(x(t), u^*, J_x^*, t) = \min_{u(t)} \mathcal{H}(x(t), u(t), J_x^*, t) H ( x ( t ) , u ∗ , J x ∗ , t ) = min u ( t ) H ( x ( t ) , u ( t ) , J x ∗ , t )
Interpretation and Properties
1. Optimality Conditions
For the optimal control, the Hamiltonian is minimized:
∂ H ∂ u ∣ u = u ∗ = 0 \frac{\partial \mathcal{H}}{\partial u}\bigg|_{u=u^*} = 0 ∂ u ∂ H u = u ∗ = 0
This gives:
∂ f ∂ u ∣ u = u ∗ ( J x ∗ ) T + ∂ g ∂ u ∣ u = u ∗ = 0 \frac{\partial f}{\partial u}\bigg|_{u=u^*} (J_x^*)^T + \frac{\partial g}{\partial u}\bigg|_{u=u^*} = 0 ∂ u ∂ f u = u ∗ ( J x ∗ ) T + ∂ u ∂ g u = u ∗ = 0
2. Costate Interpretation
The gradient J x ∗ J_x^* J x ∗ can be interpreted as the costate or adjoint variable :
λ ( t ) = J x ∗ ( x ( t ) , t ) \lambda(t) = J_x^*(x(t), t) λ ( t ) = J x ∗ ( x ( t ) , t )
The costate represents the sensitivity of the optimal cost with respect to state perturbations.
3. Connection to Pontryagin's Maximum Principle
The HJB equation is closely related to Pontryagin's Maximum Principle (PMP):
HJB : Sufficient conditions for optimality (when solution exists)
PMP : Necessary conditions for optimality
Examples
Example 1: Linear Quadratic Regulator (LQR)
System: x ˙ = A x + B u \dot{x} = Ax + Bu x ˙ = A x + B u
Cost: J = 1 2 x ( t f ) T S x ( t f ) + 1 2 ∫ 0 t f ( x T Q x + u T R u ) d t J = \frac{1}{2}x(t_f)^T S x(t_f) + \frac{1}{2}\int_0^{t_f} (x^T Q x + u^T R u) dt J = 2 1 x ( t f ) T S x ( t f ) + 2 1 ∫ 0 t f ( x T Q x + u T R u ) d t
Hamiltonian: H = 1 2 ( x T Q x + u T R u ) + λ T ( A x + B u ) \mathcal{H} = \frac{1}{2}(x^T Q x + u^T R u) + \lambda^T (Ax + Bu) H = 2 1 ( x T Q x + u T R u ) + λ T ( A x + B u )
Optimal control: u ∗ = − R − 1 B T λ u^* = -R^{-1} B^T \lambda u ∗ = − R − 1 B T λ
Value function: J ∗ ( x , t ) = 1 2 x T P ( t ) x J^*(x, t) = \frac{1}{2} x^T P(t) x J ∗ ( x , t ) = 2 1 x T P ( t ) x
Riccati equation: P ˙ + P A + A T P − P B R − 1 B T P + Q = 0 \dot{P} + PA + A^T P - PBR^{-1}B^T P + Q = 0 P ˙ + P A + A T P − PB R − 1 B T P + Q = 0
Example 2: Minimum Time Problem
System: x ˙ = f ( x , u ) \dot{x} = f(x, u) x ˙ = f ( x , u )
Cost: J = ∫ 0 t f 1 d t = t f J = \int_0^{t_f} 1 \, dt = t_f J = ∫ 0 t f 1 d t = t f
Running cost: g ( x , u , t ) = 1 g(x, u, t) = 1 g ( x , u , t ) = 1
Hamiltonian: H = 1 + λ T f ( x , u ) \mathcal{H} = 1 + \lambda^T f(x, u) H = 1 + λ T f ( x , u )
HJB equation: 0 = J t ∗ + min u { 1 + ( J x ∗ ) T f ( x , u ) } 0 = J_t^* + \min_u \{1 + (J_x^*)^T f(x, u)\} 0 = J t ∗ + min u { 1 + ( J x ∗ ) T f ( x , u )}
Solution Methods
1. Analytical Solutions
For certain special cases (e.g., LQR), analytical solutions exist:
Linear systems with quadratic costs
Certain nonlinear systems with specific structures
2. Numerical Methods
For general nonlinear problems:
Finite difference methods : Discretize the PDE
Semi-Lagrangian methods : Follow characteristic curves
Level set methods : For optimal control with constraints
Deep learning approaches : Neural network approximations
3. Approximate Solutions
Linearization : Around nominal trajectory
Perturbation methods : For small nonlinearities
Successive approximation : Iterative improvement
Advantages and Limitations
Advantages
Global optimality : Provides globally optimal solution (when it exists)
Feedback control : Results in state-feedback controller
Theoretical foundation : Fundamental principle of optimal control
Handles constraints : Can incorporate state-dependent constraints
Limitations
Curse of dimensionality : PDE complexity grows exponentially with state dimension
Smoothness requirements : Requires differentiable value function
Boundary conditions : Difficult to specify for general problems
Computational complexity : Numerical solution is challenging
Connection to Other Methods
1. Discrete-Time Dynamic Programming
The HJB equation is the continuous-time limit of the discrete-time Bellman equation:
J k ∗ ( x k ) = min u k [ J k + 1 ∗ ( f ( x k , u k ) ) + g ( x k , u k ) ] J_k^*(x_k) = \min_{u_k} [J_{k+1}^*(f(x_k, u_k)) + g(x_k, u_k)] J k ∗ ( x k ) = min u k [ J k + 1 ∗ ( f ( x k , u k )) + g ( x k , u k )]
2. Calculus of Variations
For unconstrained problems, the HJB equation reduces to the Euler-Lagrange equation from calculus of variations.
3. Model Predictive Control
MPC can be viewed as repeatedly solving finite-horizon HJB equations in a receding horizon framework.
References
Optimal Control by DR_CAN
Wang, T. (2023). 控制之美 (卷2) . Tsinghua University Press.
Bellman, R. (1957). Dynamic Programming . Princeton University Press.
Fleming, W. H., & Rishel, R. W. (1975). Deterministic and Stochastic Optimal Control . Springer.
Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control (3rd ed.). Athena Scientific.
Lewis, F. L., Vrabie, D., & Syrmos, V. L. (2012). Optimal Control (3rd ed.). John Wiley & Sons.