Skip to main content

Linear Quadratic Regulator for Continuous-Time Systems

The continuous-time Linear Quadratic Regulator (LQR) extends the optimal control theory to systems described by differential equations. This document presents the complete derivation using the Hamilton-Jacobi-Bellman (HJB) equation approach.

Problem Formulation

System Dynamics

Consider a continuous-time linear time-varying system:

x˙(t)=f(x(t),u(t),t)=A(t)x(t)+B(t)u(t)\dot{x}(t) = f(x(t), u(t), t) = A(t)x(t) + B(t)u(t)

Where:

  • x(t)Rnx(t) \in \mathbb{R}^n: state vector at time tt
  • u(t)Rpu(t) \in \mathbb{R}^p: control input at time tt
  • A(t)Rn×nA(t) \in \mathbb{R}^{n \times n}: state transition matrix
  • B(t)Rn×pB(t) \in \mathbb{R}^{n \times p}: input matrix

Quadratic Performance Function

The objective is to minimize the quadratic cost function:

J=h(x(tf),tf)+0tfg(x(τ),u(τ),τ)dτ\begin{aligned}J = h(x(t_f), t_f) + \int_0^{t_f} g(x(\tau), u(\tau), \tau) \, d\tau\end{aligned}

Where:

h(x(tf),tf)=12x(tf)TSx(tf)h(x(t_f), t_f) = \frac{1}{2} x(t_f)^T S x(t_f) g(x(t),u(t),t)=12[x(t)TQ(t)x(t)+u(t)TR(t)u(t)]g(x(t), u(t), t) = \frac{1}{2} [x(t)^T Q(t) x(t) + u(t)^T R(t) u(t)]

And tft_f is the terminal (final) time.

Weight Matrices

Terminal weight matrix:

  • SRn×nS \in \mathbb{R}^{n \times n}: symmetric positive semi-definite

State weight matrix:

  • Q(t)Rn×nQ(t) \in \mathbb{R}^{n \times n}: symmetric positive semi-definite

Input weight matrix:

  • R(t)Rp×pR(t) \in \mathbb{R}^{p \times p}: symmetric positive definite
Matrix Requirements
  • S,Q(t)S, Q(t): Positive semi-definite (0\geq 0)
  • R(t)R(t): Positive definite (>0> 0)

These conditions ensure the optimization problem is well-posed and has a unique solution.

Cost-to-Go Analysis

General Cost-to-Go Function

The cost-to-go from any time tt to the final time tft_f is:

Jttf(x(t),t)=h(x(tf),tf)+ttfg(x(τ),u(τ),τ)dτJ_{t \to t_f}(x(t), t)= h(x(t_f), t_f) + \int _ t ^ {t_f} g(x(\tau), u(\tau), \tau) d\tau

Incremental Cost Analysis

For a small time increment Δt\Delta t, the cost-to-go can be decomposed as:

Jttf(x(t))=Jtt+Δt+Jt+Δttf(x(t+Δt))\begin{aligned} J_{t \to t_f}(x(t)) = J_{t \to t + \Delta t} + J_{t+\Delta t \to t_f} (x(t + \Delta t)) \end{aligned}

Cost for interval [t,t+Δt][t, t+\Delta t]:

Jtt+Δt=12tt+Δt[x(τ)TQ(τ)x(τ)+u(τ)TR(τ)u(τ)]dτ12Δt[x(t)TQ(t)x(t)+u(t)TR(t)u(t)]\begin{aligned}J_{t \to t+\Delta t} &= \frac{1}{2}\int_t^{t + \Delta t} [x(\tau)^T Q(\tau) x(\tau) + u(\tau)^T R(\tau) u(\tau)] \, d\tau \\ &\approx \frac{1}{2}\Delta t [x(t)^T Q(t) x(t) + u(t)^T R(t) u(t)]\end{aligned}

Cost for interval [t+Δt,tf][t+\Delta t, t_f]:

Jt+Δttf(x(t+Δt))=12x(t+Δt)TP(t+Δt)x(t+Δt)\begin{aligned}J_{t+\Delta t \to t_f}(x(t+\Delta t)) = \frac{1}{2} x(t+\Delta t)^T P(t+\Delta t) x(t+\Delta t)\end{aligned}

Optimal Cost-to-Go Structure

Key Insight

The optimal cost-to-go for the LQR problem has a quadratic form:

Jttf(x(t))=12x(t)TP(t)x(t)\begin{aligned}J_{t \to t_f}^*(x(t)) = \frac{1}{2} x(t)^T P(t) x(t)\end{aligned}

where P(t)=P(t)T>0P(t) = P(t)^T > 0 is the cost-to-go matrix.

State Evolution

The state at time t+Δtt+\Delta t can be approximated as:

x(t+Δt)=x(t)+Δtx˙(t)=x(t)+Δt[A(t)x(t)+B(t)u(t)]\begin{aligned}x(t+\Delta t) = x(t) + \Delta t \dot{x}(t) = x(t) + \Delta t [A(t)x(t) + B(t)u(t)]\end{aligned}

Cost-to-Go Matrix Evolution

The cost-to-go matrix also evolves over time:

P(t+Δt)=P(t)+ΔtP˙(t)P(t+\Delta t) = P(t) + \Delta t \dot P(t)

Optimization Using HJB Equation

Complete Cost Expression

Substituting the state evolution and matrix evolution:

Jttf(x(t))=12Δt[x(t)TQ(t)x(t)+u(t)TR(t)u(t)]+12[x(t)+Δt(A(t)x(t)+B(t)u(t))]T×[P(t)+ΔtP˙(t)]×[x(t)+Δt(A(t)x(t)+B(t)u(t))]\begin{aligned}J_{t \to t_f}^*(x(t)) &= \frac{1}{2}\Delta t [x(t)^T Q(t) x(t) + u(t)^T R(t) u(t)] \\ &\quad + \frac{1}{2}[x(t) + \Delta t(A(t)x(t) + B(t)u(t))]^T \\ &\quad \times [P(t) + \Delta t \dot P (t)] \\ &\quad \times [x(t) + \Delta t(A(t)x(t) + B(t)u(t))]\end{aligned}

Expanding the Quadratic Form

After algebraic manipulation (ignoring second-order terms in Δt\Delta t):

Jttf(x(t))=12x(t)TP(t)x(t)+12Δt[x(t)TQ(t)x(t)+u(t)TR(t)u(t)+x(t)TP˙(t)x(t)+2x(t)TP(t)A(t)x(t)+2x(t)TP(t)B(t)u(t)]\begin{aligned} J_{t \to t_f}^*(x(t)) &= \frac{1}{2}x(t)^T P(t) x(t) \\ &\quad + \frac{1}{2}\Delta t [x(t)^T Q(t) x(t) + u(t)^T R(t) u(t) \\ &\quad + x(t)^T \dot P (t) x(t) \\ &\quad + 2x(t)^T P(t) A(t) x(t) + 2x(t)^T P(t) B(t) u(t)] \end{aligned}

Finding the Optimal Control

To minimize the cost, we take the derivative with respect to u(t)u(t) and set it to zero:

Jttf(x(t))u(t)=0\begin{aligned}\frac{ \partial J_{t \to t_f}^*(x(t))}{\partial u(t)} = 0\end{aligned}

This gives:

Deltat[u(t)TR(t)+x(t)TP(t)B(t)]=0Delta t [u(t)^T R(t) + x(t)^T P(t) B(t)] = 0

Optimal control law:

u(t)=R(t)1B(t)TP(t)x(t)u^*(t) = -R(t)^{-1} B(t)^T P(t) x(t)

Feedback gain:

K(t)=R(t)1B(t)TP(t)K(t) = -R(t)^{-1} B(t)^T P(t)

So the control law is:

u(t)=K(t)x(t)u(t) = K(t) x(t)

Riccati Differential Equation

Derivation

Since the optimal cost must satisfy Jttf(x(t))=12x(t)TP(t)x(t)J_{t \to t_f}^*(x(t)) = \frac{1}{2} x(t)^T P(t) x(t) for all time, substituting the optimal control back into the cost expression and equating coefficients yields:

P˙(t)=A(t)TP(t)+P(t)A(t)P(t)B(t)R(t)1B(t)TP(t)+Q(t)-\dot P(t) = A(t)^T P(t) + P(t) A(t) - P(t) B(t) R(t)^{-1} B(t)^T P(t) + Q(t)

This is the continuous-time Riccati differential equation.

Boundary Condition

The Riccati equation is solved backward in time with the terminal condition:

P(tf)=SP(t_f) = S

Alternative Forms

The Riccati equation can also be written as:

P˙(t)+A(t)TP(t)+P(t)A(t)P(t)B(t)R(t)1B(t)TP(t)+Q(t)=0\dot P(t) + A(t)^T P(t) + P(t) A(t) - P(t) B(t) R(t)^{-1} B(t)^T P(t) + Q(t) = 0

Or in terms of the closed-loop matrix Aclosed(t)=A(t)+B(t)K(t)A_{closed}(t) = A(t) + B(t)K(t):

P˙(t)+Aclosed(t)TP(t)+P(t)Aclosed(t)+K(t)TR(t)K(t)+Q(t)=0\dot P(t) + A_{closed}(t)^T P(t) + P(t) A_{closed}(t) + K(t)^T R(t) K(t) + Q(t) = 0

Time-Invariant Case

Steady-State Solution

For Linear Time-Invariant (LTI) systems where A(t)=AA(t) = A and B(t)=BB(t) = B are constant, and considering the infinite-horizon case (tft_f \to \infty):

The Riccati matrix converges to a constant:

limtP(t)=P\begin{aligned}\lim_{t \to \infty} P(t) = P_\infty\end{aligned}

Setting P˙(t)=0\dot P(t) = 0, we obtain the continuous-time algebraic Riccati equation (CARE):

0=ATP+PAPBR1BTP+Q0 = A^T P_\infty + P_\infty A - P_\infty B R^{-1} B^T P_\infty + Q

Optimal Feedback Gain

The steady-state optimal feedback gain is:

K=R1BTPK_\infty = -R^{-1} B^T P_\infty

Algorithm Summary

Finite-Horizon Solution

Backward Integration:

  1. Initialize: P(tf)=SP(t_f) = S
  2. Integrate backward: Solve the Riccati differential equation from tft_f to 00
  3. Compute gains: K(t)=R(t)1B(t)TP(t)K(t) = -R(t)^{-1} B(t)^T P(t) for all tt

Forward Implementation:

  1. Given: Initial state x(0)x(0)
  2. For t[0,tf]t \in [0, t_f]: Apply control u(t)=K(t)x(t)u(t) = K(t) x(t)

Infinite-Horizon Solution

Offline Computation:

  1. Solve CARE: Find PP_\infty satisfying the algebraic Riccati equation
  2. Compute gain: K=R1BTPK_\infty = -R^{-1} B^T P_\infty

Online Implementation:

  1. Apply constant gain: u(t)=Kx(t)u(t) = K_\infty x(t) for all t0t \geq 0

Implementation

MATLAB Implementation

function [K, P] = lqr_continuous(A, B, Q, R, S, tf)
% Continuous-time LQR solution
% Inputs: A, B (system matrices), Q, R, S (weight matrices), tf (final time)
% Outputs: K (feedback gains), P (cost matrices)

% Time vector (backward integration)
tspan = [tf, 0];

% Initial condition (terminal cost matrix)
P0 = reshape(S, [], 1); % Vectorize matrix

% Solve Riccati differential equation backward
[t, P_vec] = ode45(@(t, P) riccati_ode(t, P, A, B, Q, R), tspan, P0);

% Reshape solution
n = size(A, 1);
P = zeros(n, n, length(t));
K = zeros(size(B, 2), n, length(t));

for i = 1:length(t)
P(:, :, i) = reshape(P_vec(i, :), n, n);
K(:, :, i) = -R \ (B' * P(:, :, i));
end

% Reverse time order (forward time)
t = flipud(tf - t);
P = flip(P, 3);
K = flip(K, 3);
end

function dPdt = riccati_ode(t, P_vec, A, B, Q, R)
% Riccati differential equation
n = size(A, 1);
P = reshape(P_vec, n, n);

% Riccati equation: -dP/dt = A'P + PA - PBR^(-1)B'P + Q
dP = -(A' * P + P * A - P * B * (R \ (B' * P)) + Q);
dPdt = reshape(dP, [], 1);
end

Python Implementation

import numpy as np
from scipy.integrate import solve_ivp
from scipy.linalg import solve_continuous_are

def lqr_continuous(A, B, Q, R, S=None, tf=None):
"""
Continuous-time LQR solution
"""
n, p = A.shape[0], B.shape[1]

if tf is None:
# Infinite-horizon case: solve CARE
P = solve_continuous_are(A, B, Q, R)
K = -np.linalg.solve(R, B.T @ P)
return K, P
else:
# Finite-horizon case: solve Riccati ODE
if S is None:
S = np.zeros_like(Q)

def riccati_ode(t, P_vec):
P = P_vec.reshape(n, n)
dP = -(A.T @ P + P @ A - P @ B @ np.linalg.solve(R, B.T @ P) + Q)
return dP.flatten()

# Solve backward from tf to 0
t_span = [tf, 0]
P0 = S.flatten()

sol = solve_ivp(riccati_ode, t_span, P0, dense_output=True)

# Evaluate at desired time points
t_eval = np.linspace(0, tf, 100)
P_traj = sol.sol(tf - t_eval).T # Reverse time

# Compute feedback gains
K_traj = np.zeros((len(t_eval), p, n))
P_matrices = np.zeros((len(t_eval), n, n))

for i, P_vec in enumerate(P_traj):
P_mat = P_vec.reshape(n, n)
P_matrices[i] = P_mat
K_traj[i] = -np.linalg.solve(R, B.T @ P_mat)

return K_traj, P_matrices, t_eval

# Example usage for infinite-horizon LQR
A = np.array([[0, 1], [-2, -3]])
B = np.array([[0], [1]])
Q = np.eye(2)
R = np.array([[1]])

K, P = lqr_continuous(A, B, Q, R)
print(f"Optimal gain: {K}")
print(f"Cost matrix: {P}")

Properties and Analysis

1. Stability

For the infinite-horizon case, if (A,B)(A, B) is controllable and (A,Q1/2)(A, Q^{1/2}) is observable, then:

  • The CARE has a unique positive definite solution PP_\infty
  • The closed-loop system x˙=(A+BK)x\dot{x} = (A + BK_\infty)x is asymptotically stable

2. Optimality

The LQR controller minimizes the quadratic cost function subject to the linear system dynamics.

3. Robustness

Continuous-time LQR controllers have excellent robustness properties:

  • Gain margin: Infinite at low frequencies, 6\geq 6 dB at all frequencies
  • Phase margin: 60°\geq 60°
  • Return difference: I+K(jω)G(jω)1\|I + K_\infty (j\omega) G(j\omega)\| \geq 1 for all ω\omega

4. Frequency Domain Interpretation

The LQR solution can be interpreted as shaping the open-loop transfer function to achieve desired closed-loop characteristics.

Design Guidelines

Weight Matrix Selection

State weights (QQ):

  • Diagonal elements penalize deviations in corresponding states
  • Larger values → tighter regulation
  • Can be interpreted as inverse of allowable state variance

Input weights (RR):

  • Diagonal elements penalize control effort
  • Larger values → more conservative control
  • Trade-off between performance and control effort

Terminal weights (SS):

  • Often chosen as steady-state solution: S=PS = P_\infty
  • Can be zero for regulation problems

Tuning Process

  1. Start simple: Use identity matrices scaled appropriately
  2. Analyze response: Check settling time, overshoot, control effort
  3. Iterative design: Adjust weights based on performance requirements
  4. Physical constraints: Consider actuator limitations and system physics

Extensions and Applications

1. LQG (Linear Quadratic Gaussian)

Combines LQR with Kalman filter for systems with process noise and measurement noise.

2. Tracking Problems

Modify the cost function to track reference trajectories: g(x,u,t)=12[(xxref)TQ(xxref)+(uuref)TR(uuref)]g(x, u, t) = \frac{1}{2}[(x - x_{ref})^T Q (x - x_{ref}) + (u - u_{ref})^T R (u - u_{ref})]

3. Constrained LQR

Handle input and state constraints using:

  • Model Predictive Control (MPC)
  • Barrier functions
  • Penalty methods

4. Robust LQR

Design controllers that maintain performance under:

  • Model uncertainties
  • Parameter variations
  • Disturbances

Comparison: Discrete vs. Continuous

AspectDiscrete-TimeContinuous-Time
Systemxk+1=Axk+Bukx_{k+1} = Ax_k + Bu_kx˙=Ax+Bu\dot{x} = Ax + Bu
Costk=0N1()\sum_{k=0}^{N-1} (\cdots)0tf()dt\int_0^{t_f} (\cdots) dt
RiccatiDifference equationDifferential equation
DARE/CAREP=ATPA+QATPB(R+BTPA)1BTPAP = A^TPA + Q - A^TPB(R + B^TPA)^{-1}B^TPA0=ATP+PAPBR1BTP+Q0 = A^TP + PA - PBR^{-1}B^TP + Q
SolutionBackward recursionBackward integration
ImplementationDirectRequires integration

References

  1. Continuous-time LQR Notes - Stanford
  2. Wang, T. (2023). 控制之美 (卷2). Tsinghua University Press.
  3. Rakovic, S. V., & Levine, W. S. (2018). Handbook of Model Predictive Control. Birkhäuser.
  4. Anderson, B. D. O., & Moore, J. B. (2007). Optimal Control: Linear Quadratic Methods. Dover Publications.
  5. Lewis, F. L., Vrabie, D., & Syrmos, V. L. (2012). Optimal Control (3rd ed.). John Wiley & Sons.