Dynamic Programming for Discrete-Time Systems

This document presents the mathematical framework of dynamic programming for discrete-time optimal control problems. We derive the recursive Bellman equations that form the foundation of optimal control theory.

Discrete-Time System Formulation

System Dynamics

Consider a general discrete-time system:

x_{k+1} = f(x_k, u_k)

Where:

$x_k \in \mathbb{R}^n$ : system state at time $k$
$u_k \in \mathbb{R}^m$ : control input at time $k$
$f(\cdot, \cdot)$ : system dynamics (linear or nonlinear)

Performance Function

The general optimal control objective is to minimize:

J = h(x_N) + \sum_{k=0}^{N-1} g(x_k, u_k)

Where:

$h(x_N)$ : terminal cost function
$g(x_k, u_k)$ : stage cost function
$x_0$ : initial state (given)
$x_N$ : terminal state
$N$ : finite time horizon

Control Task

Find the optimal control sequence:

u_0^*, u_1^*, \ldots, u_{N-1}^*

that minimizes the performance function $J$ subject to the system dynamics and constraints.

Backward Multi-Stage Dynamic Programming

The key insight of dynamic programming is to solve the problem backwards in time, starting from the final stage and working towards the initial time.

Stage N: Terminal Stage ( $k = N$ )

At the terminal stage, there are no more control decisions to make:

J_{N \to N}(x_N) = h(x_N)

Since this is the final stage:

J_{N \to N}^*(x_N) = h(x_N)

The optimal cost-to-go from the terminal stage is simply the terminal cost.

Stage N-1: One Step to Go ( $k = N-1 \to N$ )

For the second-to-last stage, we consider both the immediate cost and the terminal cost:

\begin{aligned} J_{N-1 \to N}(x_N, x_{N-1}, u_{N-1}) &= h(x_N) + \sum_{k=N-1}^{N-1} g(x_k, u_k) \\ &= J_{N \to N}(x_N) + g(x_{N-1}, u_{N-1}) \end{aligned}

Since $x_N = f(x_{N-1}, u_{N-1})$ , we can express the cost in terms of the decision variables:

J_{N-1 \to N}(x_{N-1}, u_{N-1}) = J_{N \to N}^*(f(x_{N-1}, u_{N-1})) + g(x_{N-1}, u_{N-1})

Optimal cost-to-go:

J_{N-1 \to N}^*(x_{N-1}) = \min_{u_{N-1}} \left[ J_{N \to N}^*(f(x_{N-1}, u_{N-1})) + g(x_{N-1}, u_{N-1}) \right]

Optimal control: The optimal control $u_{N-1}^*$ satisfies:

\frac{\partial J_{N-1 \to N}(x_{N-1}, u_{N-1})}{\partial u_{N-1}} = 0

Stage N-2: Two Steps to Go ( $k = N-2 \to N$ )

For the third-to-last stage:

\begin{aligned} &J_{N-2 \to N}(x_N, x_{N-1}, x_{N-2}, u_{N-1}, u_{N-2}) \\ &= h(x_N) + g(x_{N-1}, u_{N-1}) + g(x_{N-2}, u_{N-2}) \end{aligned}

This can be rewritten as:

\begin{aligned} &J_{N-2 \to N}(x_{N-2}, u_{N-1}, u_{N-2}) \\ &= J_{N-1 \to N}(x_{N-1}, u_{N-1}) + g(x_{N-2}, u_{N-2}) \end{aligned}

Key insight from Bellman's principle: Whatever the initial states and controls before stage $N-2$ , the remaining control sequence from $N-2$ to $N$ must be optimal. Therefore, $u_{N-1}^*$ is already determined from the previous stage.

\begin{aligned} &J_{N-2 \to N}^*(x_{N-2}) \\ &= \min_{u_{N-2}} \left[ J_{N-1 \to N}^*(f(x_{N-2}, u_{N-2})) + g(x_{N-2}, u_{N-2}) \right] \end{aligned}

Optimal control:

\frac{\partial J_{N-2 \to N}(x_{N-2}, u_{N-2})}{\partial u_{N-2}} = 0 \Rightarrow u_{N-2}^*

General Bellman Recursion

Bellman Equation

For any stage $k$ (where $k = N-j$ for $j = 0, 1, \ldots, N$ ):

\begin{aligned} J_k^*(x_k) &= \min_{u_k} \left[ J_{k+1}^*(f(x_k, u_k)) + g(x_k, u_k) \right] \end{aligned}

With boundary condition:

J_N^*(x_N) = h(x_N)

Optimal Control Policy

The optimal control at each stage is:

u_k^*(x_k) = \arg\min_{u_k} \left[ J_{k+1}^*(f(x_k, u_k)) + g(x_k, u_k) \right]

Value Function Interpretation

$J_k^*(x_k)$ : Value function - minimum cost-to-go from state $x_k$ at time $k$
$u_k^*(x_k)$ : Policy function - optimal control as a function of current state

Properties of Dynamic Programming

1. Principle of Optimality

The Bellman equation embodies the principle of optimality:

An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

2. Backward Induction

The solution proceeds backward in time:

Start with terminal condition at $k = N$
Solve for $k = N-1, N-2, \ldots, 1, 0$
At each stage, use previously computed value function

3. Markov Property

The optimal control depends only on the current state, not on the path taken to reach that state:

u_k^* = u_k^*(x_k)

Algorithm Summary

Step 1: Initialize terminal condition

J_N^*(x_N) = h(x_N)

Step 2: For $k = N-1, N-2, \ldots, 0$ , compute:

J_k^*(x_k) = \min_{u_k} \left[ J_{k+1}^*(f(x_k, u_k)) + g(x_k, u_k) \right]

u_k^*(x_k) = \arg\min_{u_k} \left[ J_{k+1}^*(f(x_k, u_k)) + g(x_k, u_k) \right]

Step 3: Forward simulation using optimal policy Starting from $x_0$ , apply:

u_k = u_k^*(x_k), \quad x_{k+1} = f(x_k, u_k^*(x_k))

Examples of Cost Functions

1. Quadratic Cost (LQR)

System: $x_{k+1} = Ax_k + Bu_k$

Cost:

J = x_N^T S x_N + \sum_{k=0}^{N-1} (x_k^T Q x_k + u_k^T R u_k)

Bellman equation:

J_k^*(x_k) = \min_{u_k} \left[ J_{k+1}^*(Ax_k + Bu_k) + x_k^T Q x_k + u_k^T R u_k \right]

2. Minimum Time

Cost: $J = N$ (number of steps)

Stage cost: $g(x_k, u_k) = 1$

Terminal cost: $h(x_N) = 0$

3. Minimum Energy

Cost:

J = \sum_{k=0}^{N-1} u_k^T R u_k

Stage cost: $g(x_k, u_k) = u_k^T R u_k$

Terminal cost: $h(x_N) = 0$

Computational Considerations

Advantages

Global optimality: Guaranteed to find global optimum for the discretized problem
Systematic approach: Provides complete optimal policy
Handles nonlinearity: Works for nonlinear systems
Constraint handling: Can incorporate state and input constraints

Limitations

Curse of dimensionality: Computational complexity grows exponentially with state dimension
Discretization: Continuous problems must be discretized
Storage requirements: Must store value function over entire state space
Offline computation: Value function must be computed before implementation

Extensions

1. Stochastic Dynamic Programming

For systems with uncertainty:

x_{k+1} = f(x_k, u_k, w_k)

The Bellman equation becomes:

J_k^*(x_k) = \min_{u_k} \left[ \mathbb{E}_{w_k}[J_{k+1}^*(f(x_k, u_k, w_k))] + g(x_k, u_k) \right]

2. Infinite Horizon

For $N \to \infty$ , the value function becomes stationary:

J^*(x) = \min_u \left[ J^*(f(x, u)) + g(x, u) \right]

3. Approximate Dynamic Programming

Use function approximation to handle high-dimensional problems:

J_k^*(x_k) \approx \hat{J}_k(x_k, \theta_k)

where $\theta_k$ are parameters to be learned.

Connection to Other Methods

1. Model Predictive Control (MPC)

MPC applies dynamic programming over a receding horizon.

2. Reinforcement Learning

RL algorithms like Value Iteration implement dynamic programming with unknown models.

3. Pontryagin's Maximum Principle

For continuous-time problems, PMP provides necessary conditions that are related to the Bellman equation.

References

Optimal Control by DR_CAN
Wang, T. (2023). 控制之美 (卷2). Tsinghua University Press.
Bellman, R. (1957). Dynamic Programming. Princeton University Press.
Bertsekas, D. P. (2017). Dynamic Programming and Optimal Control (4th ed.). Athena Scientific.
Rakovic, S. V., & Levine, W. S. (2018). Handbook of Model Predictive Control.

Discrete-Time System Formulation​

System Dynamics​

Performance Function​

Control Task​

Backward Multi-Stage Dynamic Programming​

Stage N: Terminal Stage (k=Nk = Nk=N)​

Stage N-1: One Step to Go (k=N−1→Nk = N-1 \to Nk=N−1→N)​

Stage N-2: Two Steps to Go (k=N−2→Nk = N-2 \to Nk=N−2→N)​

General Bellman Recursion​

Bellman Equation​

Optimal Control Policy​

Value Function Interpretation​

Properties of Dynamic Programming​

1. Principle of Optimality​

2. Backward Induction​

3. Markov Property​

Algorithm Summary​

Examples of Cost Functions​

1. Quadratic Cost (LQR)​

2. Minimum Time​

3. Minimum Energy​

Computational Considerations​

Advantages​

Limitations​

Extensions​

1. Stochastic Dynamic Programming​

2. Infinite Horizon​

3. Approximate Dynamic Programming​

Connection to Other Methods​

1. Model Predictive Control (MPC)​

2. Reinforcement Learning​

3. Pontryagin's Maximum Principle​

References​