Linear Quadratic Regulator for Discrete-Time Systems

The Linear Quadratic Regulator (LQR) is one of the most important results in optimal control theory. This document presents the complete derivation and implementation of LQR for discrete-time linear systems using dynamic programming.

LQR Control System

Problem Formulation

System Dynamics

Consider a discrete-time linear time-varying system: $x_{k+1} = A_k x_k + B_k u_k$

Where:

$x_k \in \mathbb{R}^n$ : state vector at time $k$
$u_k \in \mathbb{R}^p$ : control input at time $k$
$A_k \in \mathbb{R}^{n \times n}$ : state transition matrix
$B_k \in \mathbb{R}^{n \times p}$ : input matrix

Quadratic Performance Function

The objective is to minimize the quadratic cost function: $J = h(x_N) + \sum_{k=0}^{N-1} g(x_k, u_k)$

Where: $h(x_N) = \frac{1}{2} x_N^T S x_N$ $g(x_k, u_k) = \frac{1}{2} [x_k^T Q_k x_k + u_k^T R_k u_k]$

Coefficient Convention

The factor $\frac{1}{2}$ is included to simplify derivative calculations in the optimization process.

Weight Matrices

The weight matrices must satisfy certain definiteness conditions:

Terminal weight matrix: $S = \begin{bmatrix} s_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & s_n \end{bmatrix}, \quad s_1, s_2, \ldots, s_n \geq 0$

State weight matrix: $Q_k = \begin{bmatrix} q_{1k} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & q_{nk} \end{bmatrix}, \quad q_{1k}, q_{2k}, \ldots, q_{nk} \geq 0$

Input weight matrix: $R_k = \begin{bmatrix} r_{1k} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & r_{pk} \end{bmatrix}, \quad r_{1k}, r_{2k}, \ldots, r_{pk} > 0$

Matrix Requirements

$S, Q_k$ : Positive semi-definite (allow zero eigenvalues)
$R_k$ : Positive definite (all eigenvalues must be positive)

These conditions ensure the optimization problem is well-posed and has a unique solution.

Dynamic Programming Solution

Stage N: Terminal Stage ( $k = N$ )

At the terminal stage, the cost-to-go is: $J_{N \to N}(x_N) = \frac{1}{2} x_N^T S x_N$

The optimal cost-to-go has the quadratic form: $J_{N \to N}^*(x_N) = \frac{1}{2} x_N^T P_0 x_N$

Where we define: $P_0 \triangleq S$ .

Stage N-1: One Step to Go ( $k = N-1 \to N$ )

The cost-to-go from stage $N-1$ to $N$ is:

\begin{aligned} J_{N-1 \to N}(x_{N-1}, u_{N-1}) &= \frac{1}{2} x_N^T S x_N \\ &\quad + \frac{1}{2}(x_{N-1}^T Q_{N-1} x_{N-1} + u_{N-1}^T R_{N-1} u_{N-1}) \end{aligned}

Substituting the system dynamics $x_N = A_{N-1} x_{N-1} + B_{N-1} u_{N-1}$ :

\begin{aligned} J_{N-1 \to N}(x_{N-1}, u_{N-1}) &= \frac{1}{2}[A_{N-1} x_{N-1} + B_{N-1} u_{N-1}]^T P_0 [A_{N-1} x_{N-1} + B_{N-1} u_{N-1}] \\ &\quad + \frac{1}{2}(x_{N-1}^T Q_{N-1} x_{N-1} + u_{N-1}^T R_{N-1} u_{N-1}) \end{aligned}

Optimization: Finding the Optimal Control

To find the optimal control, we take the derivative with respect to $u_{N-1}$ and set it to zero: $\frac{\partial J_{N-1 \to N}(x_{N-1}, u_{N-1})}{\partial u_{N-1}} = 0$

Derivative calculation:

For the first term, let $y(u_{N-1}) = A_{N-1} x_{N-1} + B_{N-1} u_{N-1}$ : $\frac{\partial}{\partial u_{N-1}} \left[\frac{1}{2} y^T P_0 y\right] = \frac{\partial y}{\partial u_{N-1}} \frac{\partial}{\partial y}\left[\frac{1}{2} y^T P_0 y\right] = B_{N-1}^T P_0 y$

For the second term: $\frac{\partial}{\partial u_{N-1}} \left[\frac{1}{2} u_{N-1}^T R_{N-1} u_{N-1}\right] = R_{N-1} u_{N-1}$

Setting the derivative to zero: $B_{N-1}^T P_0 [A_{N-1} x_{N-1} + B_{N-1} u_{N-1}] + R_{N-1} u_{N-1} = 0$

Solving for the optimal control: $u_{N-1}^* = -(B_{N-1}^T P_0 B_{N-1} + R_{N-1})^{-1} B_{N-1}^T P_0 A_{N-1} x_{N-1}$

This gives us the feedback control law: $u_{N-1}^* = -F_{N-1} x_{N-1}$

Where the feedback gain is: $F_{N-1} = (B_{N-1}^T P_0 B_{N-1} + R_{N-1})^{-1} B_{N-1}^T P_0 A_{N-1}$

Verification of Minimum

The second derivative confirms this is a minimum: $\frac{\partial^2 J_{N-1 \to N}}{\partial u_{N-1}^2} = B_{N-1}^T P_0 B_{N-1} + R_{N-1} > 0$

Since $P_0 = S \geq 0$ and $R_{N-1} > 0$ , the Hessian is positive definite.

Cost-to-Go Matrix Update

Substituting the optimal control back into the cost function: $J_{N-1 \to N}^*(x_{N-1}) = \frac{1}{2} x_{N-1}^T P_1 x_{N-1}$

Where: $P_1 = (A_{N-1} - B_{N-1} F_{N-1})^T P_0 (A_{N-1} - B_{N-1} F_{N-1}) + F_{N-1}^T R_{N-1} F_{N-1} + Q_{N-1}$

This is the discrete-time algebraic Riccati equation in recursive form.

General Recursive Solution

For Stage $k = N-j$ to $N$

Optimal control: $u_{N-j}^* = -F_{N-j} x_{N-j}$

Feedback gain: $F_{N-j} = (B_{N-j}^T P_{j-1} B_{N-j} + R_{N-j})^{-1} B_{N-j}^T P_{j-1} A_{N-j}$

Cost-to-go: $J_{N-j \to N}^*(x_{N-j}) = \frac{1}{2} x_{N-j}^T P_j x_{N-j}$

Riccati recursion: $P_j = (A_{N-j} - B_{N-j} F_{N-j})^T P_{j-1} (A_{N-j} - B_{N-j} F_{N-j}) + F_{N-j}^T R_{N-j} F_{N-j} + Q_{N-j}$

Alternative Riccati Form

The Riccati equation can also be written as: $P_j = A_{N-j}^T P_{j-1} A_{N-j} + Q_{N-j} - A_{N-j}^T P_{j-1} B_{N-j} (B_{N-j}^T P_{j-1} B_{N-j} + R_{N-j})^{-1} B_{N-j}^T P_{j-1} A_{N-j}$

Algorithm Summary

Backward Pass (Offline Computation)

Initialize: $P_0 = S$

For $j = 1, 2, \ldots, N$ :

Compute feedback gain: $F_{N-j} = (B_{N-j}^T P_{j-1} B_{N-j} + R_{N-j})^{-1} B_{N-j}^T P_{j-1} A_{N-j}$
Update Riccati matrix: $P_j = (A_{N-j} - B_{N-j} F_{N-j})^T P_{j-1} (A_{N-j} - B_{N-j} F_{N-j}) + F_{N-j}^T R_{N-j} F_{N-j} + Q_{N-j}$

Forward Pass (Online Implementation)

Given: Initial state $x_0$

For $k = 0, 1, \ldots, N-1$ :

Apply control: $u_k = -F_k x_k$
Update state: $x_{k+1} = A_k x_k + B_k u_k$

Time-Invariant Case

For Linear Time-Invariant (LTI) systems where $A_k = A$ and $B_k = B$ are constant:

Steady-State Solution

As $N \to \infty$ , the feedback gain converges to a constant: $\lim_{N \to \infty} F_{N-k} = F_{\infty}$

The steady-state gain satisfies: $F_{\infty} = (B^T P_{\infty} B + R)^{-1} B^T P_{\infty} A$

Where $P_{\infty}$ is the solution to the discrete-time algebraic Riccati equation (DARE): $P_{\infty} = A^T P_{\infty} A + Q - A^T P_{\infty} B (B^T P_{\infty} B + R)^{-1} B^T P_{\infty} A$

Advantages of Steady-State Solution

Computational efficiency: No need to solve Riccati recursion online
Memory savings: Store only one gain matrix instead of sequence
Stability guarantees: If $(A,B)$ is controllable and $(A,Q^{1/2})$ is observable

Implementation

MATLAB Implementation

function [F] = lqr_dp(A, B, Q, R, S)
% LQR solution using dynamic programming
% Inputs: A, B (system matrices), Q, R, S (weight matrices)
% Output: F (feedback gain)

    P0 = S;                    % Initialize P_[0]
    max_iter = 200;            % Maximum iterations
    tolerance = 1e-3;          % Convergence tolerance
    
    P_k_minus_1 = P0;
    F_prev = Inf;
    
    for k = 1:max_iter
        % Compute feedback gain
        F_current = (R + B' * P_k_minus_1 * B) \ (B' * P_k_minus_1 * A);
        
        % Update Riccati matrix
        P_k = (A - B * F_current)' * P_k_minus_1 * (A - B * F_current) + ...
              F_current' * R * F_current + Q;
        
        % Check convergence
        if max(abs(F_current - F_prev), [], 'all') < tolerance
            fprintf('Converged in %d iterations\n', k);
            F = F_current;
            return;
        end
        
        P_k_minus_1 = P_k;
        F_prev = F_current;
    end
    
    error('Maximum iterations exceeded');
end

Python Implementation

import numpy as np

def lqr_dp(A, B, Q, R, S):
    """
    LQR solution using dynamic programming
    """
    P0 = S                     # Initialize P_[0]
    max_iter = 200             # Maximum iterations
    tolerance = 1e-3           # Convergence tolerance
    
    P_k_minus_1 = P0
    F_prev = np.inf
    
    for k in range(max_iter):
        # Compute feedback gain
        F_current = np.linalg.solve(
            R + B.T @ P_k_minus_1 @ B, 
            B.T @ P_k_minus_1 @ A
        )
        
        # Update Riccati matrix
        A_cl = A - B @ F_current
        P_k = A_cl.T @ P_k_minus_1 @ A_cl + F_current.T @ R @ F_current + Q
        
        # Check convergence
        if np.max(np.abs(F_current - F_prev)) < tolerance:
            print(f'Converged in {k+1} iterations')
            return F_current
            
        P_k_minus_1 = P_k
        F_prev = F_current
    
    raise ValueError('Maximum iterations exceeded')

Properties and Characteristics

1. Optimality

The LQR controller minimizes the quadratic cost function subject to the linear system dynamics.

2. Stability

For the infinite-horizon case, if $(A,B)$ is controllable, the closed-loop system $x_{k+1} = (A - BF_{\infty})x_k$ is asymptotically stable.

3. Robustness

LQR controllers have excellent robustness properties:

Gain margin: $\geq 6$ dB
Phase margin: $\geq 60°$

4. Linear State Feedback

The optimal control law is a linear function of the current state, making it easy to implement.

Design Guidelines

Weight Matrix Selection

State weights ( $Q$ ):

Larger values → tighter regulation of corresponding states
Zero diagonal elements → no penalty for deviations in those states

Input weights ( $R$ ):

Larger values → more conservative control (smaller inputs)
Smaller values → more aggressive control (larger inputs)

Terminal weights ( $S$ ):

Often chosen as solution to infinite-horizon DARE
Can be zero for finite-horizon problems

Tuning Process

Start with identity matrices: $Q = I$ , $R = I$
Adjust relative weights based on performance requirements
Use iterative design and simulation verification
Consider physical constraints and actuator limitations

Extensions and Variations

1. LQR with Integral Action

Add integral states to eliminate steady-state error for step references.

2. LQG (Linear Quadratic Gaussian)

Combine LQR with Kalman filter for systems with noise and partial state measurements.

3. Robust LQR

Design controllers that maintain performance under model uncertainties.

4. Constrained LQR

Handle input and state constraints using model predictive control (MPC).

References

Optimal Control by DR_CAN
Wang, T. (2023). 控制之美 (卷2). Tsinghua University Press.
Anderson, B. D. O., & Moore, J. B. (2007). Optimal Control: Linear Quadratic Methods. Dover Publications.
Lewis, F. L., Vrabie, D., & Syrmos, V. L. (2012). Optimal Control (3rd ed.). John Wiley & Sons.
Åström, K. J., & Murray, R. M. (2021). Feedback Systems: An Introduction for Scientists and Engineers (2nd ed.). Princeton University Press.

Problem Formulation​

System Dynamics​

Quadratic Performance Function​

Weight Matrices​

Dynamic Programming Solution​

Stage N: Terminal Stage (k=Nk = Nk=N)​

Stage N-1: One Step to Go (k=N−1→Nk = N-1 \to Nk=N−1→N)​

Optimization: Finding the Optimal Control​

Verification of Minimum​

Cost-to-Go Matrix Update​

General Recursive Solution​

For Stage k=N−jk = N-jk=N−j to NNN​

Alternative Riccati Form​

Algorithm Summary​

Backward Pass (Offline Computation)​

Forward Pass (Online Implementation)​

Time-Invariant Case​

Steady-State Solution​

Advantages of Steady-State Solution​

Implementation​

MATLAB Implementation​

Python Implementation​

Properties and Characteristics​

1. Optimality​

2. Stability​

3. Robustness​

4. Linear State Feedback​

Design Guidelines​

Weight Matrix Selection​

Tuning Process​

Extensions and Variations​

1. LQR with Integral Action​

2. LQG (Linear Quadratic Gaussian)​

3. Robust LQR​

4. Constrained LQR​

References​