Dynamic Programming for Discrete-Time Systems
This document presents the mathematical framework of dynamic programming for discrete-time optimal control problems. We derive the recursive Bellman equations that form the foundation of optimal control theory.
Discrete-Time System Formulation
System Dynamics
Consider a general discrete-time system:
Where:
- : system state at time
- : control input at time
- : system dynamics (linear or nonlinear)
Performance Function
The general optimal control objective is to minimize:
Where:
- : terminal cost function
- : stage cost function
- : initial state (given)
- : terminal state
- : finite time horizon
Control Task
Find the optimal control sequence:
that minimizes the performance function subject to the system dynamics and constraints.
Backward Multi-Stage Dynamic Programming
The key insight of dynamic programming is to solve the problem backwards in time, starting from the final stage and working towards the initial time.
Stage N: Terminal Stage ()
At the terminal stage, there are no more control decisions to make:
Since this is the final stage:
The optimal cost-to-go from the terminal stage is simply the terminal cost.
Stage N-1: One Step to Go ()
For the second-to-last stage, we consider both the immediate cost and the terminal cost:
Since , we can express the cost in terms of the decision variables:
Optimal cost-to-go:
Optimal control: The optimal control satisfies:
Stage N-2: Two Steps to Go ()
For the third-to-last stage:
This can be rewritten as:
Key insight from Bellman's principle: Whatever the initial states and controls before stage , the remaining control sequence from to must be optimal. Therefore, is already determined from the previous stage.
Optimal control:
General Bellman Recursion
Bellman Equation
For any stage (where for ):
With boundary condition:
Optimal Control Policy
The optimal control at each stage is:
Value Function Interpretation
- : Value function - minimum cost-to-go from state at time
- : Policy function - optimal control as a function of current state
Properties of Dynamic Programming
1. Principle of Optimality
The Bellman equation embodies the principle of optimality:
An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
2. Backward Induction
The solution proceeds backward in time:
- Start with terminal condition at
- Solve for
- At each stage, use previously computed value function
3. Markov Property
The optimal control depends only on the current state, not on the path taken to reach that state:
Algorithm Summary
Step 1: Initialize terminal condition
Step 2: For , compute:
Step 3: Forward simulation using optimal policy Starting from , apply:
Examples of Cost Functions
1. Quadratic Cost (LQR)
System:
Cost:
Bellman equation:
2. Minimum Time
Cost: (number of steps)
Stage cost:
Terminal cost:
3. Minimum Energy
Cost:
Stage cost:
Terminal cost:
Computational Considerations
Advantages
- Global optimality: Guaranteed to find global optimum for the discretized problem
- Systematic approach: Provides complete optimal policy
- Handles nonlinearity: Works for nonlinear systems
- Constraint handling: Can incorporate state and input constraints
Limitations
- Curse of dimensionality: Computational complexity grows exponentially with state dimension
- Discretization: Continuous problems must be discretized
- Storage requirements: Must store value function over entire state space
- Offline computation: Value function must be computed before implementation
Extensions
1. Stochastic Dynamic Programming
For systems with uncertainty:
The Bellman equation becomes:
2. Infinite Horizon
For , the value function becomes stationary:
3. Approximate Dynamic Programming
Use function approximation to handle high-dimensional problems:
where are parameters to be learned.
Connection to Other Methods
1. Model Predictive Control (MPC)
MPC applies dynamic programming over a receding horizon.
2. Reinforcement Learning
RL algorithms like Value Iteration implement dynamic programming with unknown models.
3. Pontryagin's Maximum Principle
For continuous-time problems, PMP provides necessary conditions that are related to the Bellman equation.
References
- Optimal Control by DR_CAN
- Wang, T. (2023). 控制之美 (卷2). Tsinghua University Press.
- Bellman, R. (1957). Dynamic Programming. Princeton University Press.
- Bertsekas, D. P. (2017). Dynamic Programming and Optimal Control (4th ed.). Athena Scientific.
- Rakovic, S. V., & Levine, W. S. (2018). Handbook of Model Predictive Control.