準ニュートン法（Quasi-Newton Methods）

概要

準ニュートン法は、ニュートン法の二次収束性を維持しながら、ヘッセ行列の計算と逆行列計算の負担を軽減する手法です。勾配情報のみを使用してヘッセ行列の近似を反復的に更新し、実用的な最適化問題において最も成功している手法の一つです。

基本概念

ニュートン法の復習

ニュートン法の更新式：

x_{k+1} = x_k - H_k^{-1} \nabla f(x_k)

ここで、 $H_k = \nabla^2 f(x_k)$ はヘッセ行列です。

準ニュートン条件（Secant Condition）

連続する2点での勾配情報から、ヘッセ行列の近似 $B_k$ は以下を満たすべきです：

B_{k+1} s_k = y_k

ここで：

$s_k = x_{k+1} - x_k$ （位置の変化）
$y_k = \nabla f(x_{k+1}) - \nabla f(x_k)$ （勾配の変化）

これは準ニュートン方程式またはセカント方程式と呼ばれます。

曲率条件

準ニュートン法が収束するための必要条件：

s_k^T y_k > 0

この条件は、関数が凸であるか、直線探索が適切に実行されている場合に満たされます。

主要な準ニュートン法

DFP法（Davidon-Fletcher-Powell）

歴史上最初の準ニュートン法：

B_{k+1} = B_k - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k} + \frac{y_k y_k^T}{y_k^T s_k}

または、逆行列の更新式：

H_{k+1} = H_k + \frac{s_k s_k^T}{s_k^T y_k} - \frac{H_k y_k y_k^T H_k}{y_k^T H_k y_k}

BFGS法（Broyden-Fletcher-Goldfarb-Shanno）

最も広く使用される準ニュートン法：

B_{k+1} = B_k - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k} + \frac{y_k y_k^T}{y_k^T s_k}

逆行列の更新式（Sherman-Morrison公式）：

H_{k+1} = \left(I - \frac{s_k y_k^T}{y_k^T s_k}\right) H_k \left(I - \frac{y_k s_k^T}{y_k^T s_k}\right) + \frac{s_k s_k^T}{y_k^T s_k}

実装例

Python実装

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

class QuasiNewtonOptimizer:
    """準ニュートン法の統一実装クラス"""
    
    def __init__(self, method='bfgs'):
        """
        Parameters:
        method: 'bfgs', 'dfp', 'sr1' のいずれか
        """
        self.method = method.lower()
        
    def optimize(self, f, grad_f, x0, max_iter=1000, tol=1e-6, 
                line_search='wolfe', verbose=False):
        """
        準ニュートン法による最適化
        
        Parameters:
        f: 目的関数
        grad_f: 勾配関数
        x0: 初期点
        max_iter: 最大反復回数
        tol: 許容誤差
        line_search: 直線探索手法 ('armijo', 'wolfe', 'exact')
        """
        
        n = len(x0)
        x = x0.copy()
        H = np.eye(n)  # 初期逆ヘッセ行列近似（単位行列）
        
        history = {
            'x_values': [x.copy()],
            'f_values': [f(x)],
            'grad_norms': [np.linalg.norm(grad_f(x))],
            'step_sizes': [],
            'curvature_conditions': []
        }
        
        grad = grad_f(x)
        
        if verbose:
            print(f"初期点: {x}")
            print(f"初期目的関数値: {f(x):.6f}")
            print(f"初期勾配ノルム: {np.linalg.norm(grad):.6f}")
            print(f"使用手法: {self.method.upper()}")
        
        for k in range(max_iter):
            # 収束判定
            grad_norm = np.linalg.norm(grad)
            if grad_norm < tol:
                if verbose:
                    print(f"\n収束しました (反復回数: {k})")
                break
            
            # 探索方向の計算
            d = -H @ grad
            
            # 減少方向のチェック
            if np.dot(grad, d) >= 0:
                if verbose:
                    print("警告: 減少方向ではありません。ヘッセ行列近似をリセットします。")
                H = np.eye(n)
                d = -grad
            
            # 直線探索
            if line_search == 'armijo':
                alpha, x_new = self._armijo_line_search(f, grad_f, x, d)
            elif line_search == 'wolfe':
                alpha, x_new = self._wolfe_line_search(f, grad_f, x, d)
            else:  # exact
                alpha, x_new = self._exact_line_search(f, x, d)
            
            # 新しい勾配の計算
            grad_new = grad_f(x_new)
            
            # s_k と y_k の計算
            s = x_new - x
            y = grad_new - grad
            
            # 曲率条件のチェック
            sy = np.dot(s, y)
            curvature_satisfied = sy > 1e-12
            history['curvature_conditions'].append(curvature_satisfied)
            
            if curvature_satisfied:
                # ヘッセ行列近似の更新
                if self.method == 'bfgs':
                    H = self._bfgs_update(H, s, y)
                elif self.method == 'dfp':
                    H = self._dfp_update(H, s, y)
                elif self.method == 'sr1':
                    H = self._sr1_update(H, s, y)
            else:
                if verbose and k % 10 == 0:
                    print(f"警告: 反復 {k} で曲率条件が満たされていません (s^T y = {sy:.2e})")
            
            # 履歴の更新
            x = x_new
            grad = grad_new
            history['x_values'].append(x.copy())
            history['f_values'].append(f(x))
            history['grad_norms'].append(np.linalg.norm(grad))
            history['step_sizes'].append(alpha)
            
            if verbose and k % 10 == 0:
                print(f"反復 {k}: f = {f(x):.6f}, ||∇f|| = {np.linalg.norm(grad):.6f}, α = {alpha:.4f}")
        
        return x, history
    
    def _bfgs_update(self, H, s, y):
        """BFGS更新式"""
        sy = np.dot(s, y)
        rho = 1.0 / sy
        
        I = np.eye(len(s))
        V1 = I - rho * np.outer(s, y)
        V2 = I - rho * np.outer(y, s)
        
        H_new = V1 @ H @ V2 + rho * np.outer(s, s)
        return H_new
    
    def _dfp_update(self, H, s, y):
        """DFP更新式"""
        sy = np.dot(s, y)
        Hy = H @ y
        yHy = np.dot(y, Hy)
        
        H_new = H + np.outer(s, s) / sy - np.outer(Hy, Hy) / yHy
        return H_new
    
    def _sr1_update(self, H, s, y):
        """SR1 (Symmetric Rank-1) 更新式"""
        u = s - H @ y
        denominator = np.dot(u, y)
        
        # 数値的安定性のチェック
        if abs(denominator) > 1e-12:
            H_new = H + np.outer(u, u) / denominator
            return H_new
        else:
            return H  # 更新しない
    
    def _armijo_line_search(self, f, grad_f, x, d, alpha_init=1.0, c1=1e-4, rho=0.5):
        """Armijo条件による直線探索"""
        alpha = alpha_init
        f_x = f(x)
        grad_x = grad_f(x)
        
        armijo_threshold = c1 * np.dot(grad_x, d)
        
        for _ in range(50):
            x_new = x + alpha * d
            if f(x_new) <= f_x + alpha * armijo_threshold:
                return alpha, x_new
            alpha *= rho
        
        return alpha, x + alpha * d
    
    def _wolfe_line_search(self, f, grad_f, x, d, alpha_init=1.0, c1=1e-4, c2=0.9):
        """簡略化されたWolfe条件による直線探索"""
        alpha = alpha_init
        f_x = f(x)
        grad_x = grad_f(x)
        
        # Armijo条件のみを実装（簡略化）
        for _ in range(50):
            x_new = x + alpha * d
            if f(x_new) <= f_x + c1 * alpha * np.dot(grad_x, d):
                return alpha, x_new
            alpha *= 0.5
        
        return alpha, x + alpha * d
    
    def _exact_line_search(self, f, x, d, alpha_max=2.0):
        """簡単な完全直線探索（黄金分割）"""
        def phi(alpha):
            return f(x + alpha * d)
        
        # 黄金分割探索
        a, b = 0, alpha_max
        golden_ratio = (1 + np.sqrt(5)) / 2
        inv_golden = 1 / golden_ratio
        
        x1 = a + (1 - inv_golden) * (b - a)
        x2 = a + inv_golden * (b - a)
        f1, f2 = phi(x1), phi(x2)
        
        for _ in range(20):
            if abs(b - a) < 1e-6:
                break
            
            if f1 < f2:
                b = x2
                x2 = x1
                f2 = f1
                x1 = a + (1 - inv_golden) * (b - a)
                f1 = phi(x1)
            else:
                a = x1
                x1 = x2
                f1 = f2
                x2 = a + inv_golden * (b - a)
                f2 = phi(x2)
        
        alpha_opt = (a + b) / 2
        return alpha_opt, x + alpha_opt * d

def limited_memory_bfgs(f, grad_f, x0, m=10, max_iter=1000, tol=1e-6, verbose=False):
    """
    Limited-Memory BFGS (L-BFGS) 実装
    
    Parameters:
    m: 保存する履歴の数
    """
    
    n = len(x0)
    x = x0.copy()
    grad = grad_f(x)
    
    # 履歴を保存するためのリスト
    s_history = []  # s_k の履歴
    y_history = []  # y_k の履歴
    rho_history = []  # 1/(y_k^T s_k) の履歴
    
    history = {
        'x_values': [x.copy()],
        'f_values': [f(x)],
        'grad_norms': [np.linalg.norm(grad)]
    }
    
    if verbose:
        print(f"L-BFGS開始: f(x0) = {f(x):.6f}, ||∇f|| = {np.linalg.norm(grad):.6f}")
    
    for k in range(max_iter):
        # 収束判定
        grad_norm = np.linalg.norm(grad)
        if grad_norm < tol:
            if verbose:
                print(f"\n収束しました (反復回数: {k})")
            break
        
        # L-BFGS二重ループによる探索方向の計算
        q = grad.copy()
        
        # 後ろ向きループ
        alpha_values = []
        for i in range(len(s_history) - 1, -1, -1):
            alpha_i = rho_history[i] * np.dot(s_history[i], q)
            q = q - alpha_i * y_history[i]
            alpha_values.append(alpha_i)
        
        alpha_values.reverse()  # 順序を戻す
        
        # 初期ヘッセ行列近似の適用
        if len(y_history) > 0:
            # Nocedal & Wright の推奨スケーリング
            gamma_k = np.dot(s_history[-1], y_history[-1]) / np.dot(y_history[-1], y_history[-1])
            r = gamma_k * q
        else:
            r = q
        
        # 前向きループ
        for i in range(len(s_history)):
            beta = rho_history[i] * np.dot(y_history[i], r)
            r = r + s_history[i] * (alpha_values[i] - beta)
        
        d = -r  # 探索方向
        
        # 直線探索
        alpha, x_new = armijo_line_search_simple(f, grad_f, x, d)
        
        # 新しい勾配
        grad_new = grad_f(x_new)
        
        # s_k と y_k の計算
        s = x_new - x
        y = grad_new - grad
        
        # 曲率条件のチェック
        sy = np.dot(s, y)
        if sy > 1e-12:
            # 履歴の更新
            s_history.append(s)
            y_history.append(y)
            rho_history.append(1.0 / sy)
            
            # メモリ制限の適用
            if len(s_history) > m:
                s_history.pop(0)
                y_history.pop(0)
                rho_history.pop(0)
        
        # 更新
        x = x_new
        grad = grad_new
        
        # 履歴の更新
        history['x_values'].append(x.copy())
        history['f_values'].append(f(x))
        history['grad_norms'].append(np.linalg.norm(grad))
        
        if verbose and k % 10 == 0:
            print(f"反復 {k}: f = {f(x):.6f}, ||∇f|| = {np.linalg.norm(grad):.6f}, メモリ使用: {len(s_history)}/{m}")
    
    return x, history

def armijo_line_search_simple(f, grad_f, x, d, alpha_init=1.0, c1=1e-4, rho=0.5):
    """簡単なArmijo直線探索"""
    alpha = alpha_init
    f_x = f(x)
    grad_x = grad_f(x)
    
    for _ in range(50):
        x_new = x + alpha * d
        if f(x_new) <= f_x + c1 * alpha * np.dot(grad_x, d):
            return alpha, x_new
        alpha *= rho
    
    return alpha, x + alpha * d

def compare_quasi_newton_methods():
    """各準ニュートン法の比較"""
    
    # テスト関数: Rosenbrock関数
    def rosenbrock(x):
        return 100 * (x[1] - x[0]**2)**2 + (1 - x[0])**2
    
    def rosenbrock_grad(x):
        return np.array([
            -400 * x[0] * (x[1] - x[0]**2) - 2 * (1 - x[0]),
            200 * (x[1] - x[0]**2)
        ])
    
    # 初期点
    x0 = np.array([-1.2, 1.0])
    
    # 各手法の実行
    methods = ['bfgs', 'dfp']
    results = {}
    
    plt.figure(figsize=(15, 10))
    
    for i, method in enumerate(methods):
        optimizer = QuasiNewtonOptimizer(method=method)
        x_opt, history = optimizer.optimize(rosenbrock, rosenbrock_grad, x0, 
                                          max_iter=100, verbose=True)
        results[method] = (x_opt, history)
        
        print(f"\n{method.upper()}法の結果:")
        print(f"最適解: {x_opt}")
        print(f"最適値: {rosenbrock(x_opt):.2e}")
        print(f"反復回数: {len(history['f_values']) - 1}")
    
    # L-BFGS の実行
    x_lbfgs, history_lbfgs = limited_memory_bfgs(rosenbrock, rosenbrock_grad, x0, 
                                                m=5, max_iter=100, verbose=True)
    results['l-bfgs'] = (x_lbfgs, history_lbfgs)
    
    print(f"\nL-BFGS法の結果:")
    print(f"最適解: {x_lbfgs}")
    print(f"最適値: {rosenbrock(x_lbfgs):.2e}")
    print(f"反復回数: {len(history_lbfgs['f_values']) - 1}")
    
    # 可視化
    # 1. 収束経路
    plt.subplot(2, 3, 1)
    
    # 等高線の描画
    x_range = np.linspace(-1.5, 1.5, 100)
    y_range = np.linspace(-0.5, 1.5, 100)
    X, Y = np.meshgrid(x_range, y_range)
    Z = np.zeros_like(X)
    
    for i in range(len(x_range)):
        for j in range(len(y_range)):
            Z[j, i] = rosenbrock([X[j, i], Y[j, i]])
    
    plt.contour(X, Y, Z, levels=np.logspace(0, 3, 20), alpha=0.6)
    plt.colorbar()
    
    colors = ['red', 'blue', 'green']
    method_names = ['BFGS', 'DFP', 'L-BFGS']
    
    for i, (method, color, name) in enumerate(zip(['bfgs', 'dfp', 'l-bfgs'], colors, method_names)):
        _, history = results[method]
        x_history = np.array(history['x_values'])
        plt.plot(x_history[:, 0], x_history[:, 1], 'o-', color=color, 
                linewidth=2, markersize=4, label=name)
    
    plt.plot(1, 1, 'k*', markersize=15, label='最適解')
    plt.xlabel('x₁')
    plt.ylabel('x₂')
    plt.title('収束経路の比較 (Rosenbrock関数)')
    plt.legend()
    plt.grid(True)
    
    # 2. 目的関数値の収束
    plt.subplot(2, 3, 2)
    for i, (method, color, name) in enumerate(zip(['bfgs', 'dfp', 'l-bfgs'], colors, method_names)):
        _, history = results[method]
        f_values = history['f_values']
        plt.semilogy(f_values, color=color, linewidth=2, label=name)
    
    plt.xlabel('反復回数')
    plt.ylabel('目的関数値')
    plt.title('目的関数値の収束')
    plt.legend()
    plt.grid(True)
    
    # 3. 勾配ノルムの収束
    plt.subplot(2, 3, 3)
    for i, (method, color, name) in enumerate(zip(['bfgs', 'dfp', 'l-bfgs'], colors, method_names)):
        _, history = results[method]
        grad_norms = history['grad_norms']
        plt.semilogy(grad_norms, color=color, linewidth=2, label=name)
    
    plt.xlabel('反復回数')
    plt.ylabel('勾配ノルム')
    plt.title('勾配ノルムの収束')
    plt.legend()
    plt.grid(True)
    
    # 4. 収束率の比較
    plt.subplot(2, 3, 4)
    for i, (method, color, name) in enumerate(zip(['bfgs', 'dfp', 'l-bfgs'], colors, method_names)):
        _, history = results[method]
        f_values = np.array(history['f_values'])
        if len(f_values) > 5:
            convergence_rates = []
            for k in range(2, len(f_values)):
                if f_values[k-1] != f_values[-1] and f_values[k-2] != f_values[-1]:
                    rate = (f_values[k] - f_values[-1]) / (f_values[k-1] - f_values[-1])
                    convergence_rates.append(rate)
            
            if convergence_rates:
                plt.plot(convergence_rates[:20], color=color, linewidth=2, label=name)
    
    plt.xlabel('反復回数')
    plt.ylabel('収束率')
    plt.title('収束率の比較')
    plt.legend()
    plt.grid(True)
    
    # 5. ステップサイズの比較
    plt.subplot(2, 3, 5)
    for i, (method, color, name) in enumerate(zip(['bfgs', 'dfp'], colors[:2], method_names[:2])):
        _, history = results[method]
        if 'step_sizes' in history:
            step_sizes = history['step_sizes']
            plt.plot(step_sizes, color=color, linewidth=2, label=name)
    
    plt.xlabel('反復回数')
    plt.ylabel('ステップサイズ')
    plt.title('ステップサイズの変化')
    plt.legend()
    plt.grid(True)
    
    # 6. 曲率条件の満足度
    plt.subplot(2, 3, 6)
    for i, (method, color, name) in enumerate(zip(['bfgs', 'dfp'], colors[:2], method_names[:2])):
        _, history = results[method]
        if 'curvature_conditions' in history:
            curvature = history['curvature_conditions']
            satisfaction_rate = np.cumsum(curvature) / np.arange(1, len(curvature) + 1)
            plt.plot(satisfaction_rate, color=color, linewidth=2, label=name)
    
    plt.xlabel('反復回数')
    plt.ylabel('曲率条件満足率')
    plt.title('曲率条件の満足度')
    plt.legend()
    plt.grid(True)
    
    plt.tight_layout()
    plt.show()
    
    return results

# 実行例
if __name__ == "__main__":
    print("=== 準ニュートン法の比較 ===")
    results = compare_quasi_newton_methods()

MATLAB実装

function [x, history] = bfgs_method(f, grad_f, x0, options)
    % BFGS法の実装
    
    if nargin < 4, options = struct(); end
    
    % デフォルトパラメータ
    max_iter = getfield_default(options, 'max_iter', 1000);
    tol = getfield_default(options, 'tol', 1e-6);
    line_search = getfield_default(options, 'line_search', 'armijo');
    
    n = length(x0);
    x = x0;
    H = eye(n);  % 初期逆ヘッセ行列近似
    grad = grad_f(x);
    
    history.x_values = x0;
    history.f_values = f(x0);
    history.grad_norms = norm(grad);
    
    fprintf('BFGS法開始: f(x0) = %.6f, ||∇f|| = %.6f\n', f(x), norm(grad));
    
    for k = 1:max_iter
        % 収束判定
        grad_norm = norm(grad);
        if grad_norm < tol
            fprintf('\n収束しました (反復回数: %d)\n', k);
            break;
        end
        
        % 探索方向
        d = -H * grad;
        
        % 直線探索
        if strcmp(line_search, 'armijo')
            [alpha, x_new] = armijo_line_search(f, grad_f, x, d);
        else
            alpha = 1.0;  % 単位ステップ
            x_new = x + alpha * d;
        end
        
        % 新しい勾配
        grad_new = grad_f(x_new);
        
        % s_k と y_k
        s = x_new - x;
        y = grad_new - grad;
        
        % 曲率条件のチェック
        sy = s' * y;
        if sy > 1e-12
            % BFGS更新
            rho = 1 / sy;
            I = eye(n);
            V1 = I - rho * (s * y');
            V2 = I - rho * (y * s');
            H = V1 * H * V2 + rho * (s * s');
        else
            fprintf('警告: 反復 %d で曲率条件が満たされていません\n', k);
        end
        
        % 更新
        x = x_new;
        grad = grad_new;
        
        % 履歴の更新
        history.x_values = [history.x_values, x];
        history.f_values = [history.f_values, f(x)];
        history.grad_norms = [history.grad_norms, norm(grad)];
        
        if mod(k, 10) == 0
            fprintf('反復 %d: f = %.6f, ||∇f|| = %.6f\n', k, f(x), norm(grad));
        end
    end
end

function [alpha, x_new] = armijo_line_search(f, grad_f, x, d)
    alpha = 1.0;
    c1 = 1e-4;
    rho = 0.5;
    
    f_x = f(x);
    grad_x = grad_f(x);
    
    for i = 1:50
        x_new = x + alpha * d;
        if f(x_new) <= f_x + c1 * alpha * (grad_x' * d)
            return;
        end
        alpha = rho * alpha;
    end
end

function [x, history] = lbfgs_method(f, grad_f, x0, m, options)
    % L-BFGS法の実装
    
    if nargin < 4, m = 10; end
    if nargin < 5, options = struct(); end
    
    max_iter = getfield_default(options, 'max_iter', 1000);
    tol = getfield_default(options, 'tol', 1e-6);
    
    n = length(x0);
    x = x0;
    grad = grad_f(x);
    
    % 履歴の初期化
    s_history = [];
    y_history = [];
    rho_history = [];
    
    history.x_values = x0;
    history.f_values = f(x0);
    history.grad_norms = norm(grad);
    
    fprintf('L-BFGS法開始 (m=%d): f(x0) = %.6f, ||∇f|| = %.6f\n', m, f(x), norm(grad));
    
    for k = 1:max_iter
        % 収束判定
        grad_norm = norm(grad);
        if grad_norm < tol
            fprintf('\n収束しました (反復回数: %d)\n', k);
            break;
        end
        
        % L-BFGS二重ループ
        q = grad;
        
        % 後ろ向きループ
        alpha_values = [];
        for i = length(s_history):-1:1
            alpha_i = rho_history(i) * (s_history{i}' * q);
            q = q - alpha_i * y_history{i};
            alpha_values = [alpha_i; alpha_values];
        end
        
        % 初期ヘッセ行列近似
        if ~isempty(y_history)
            gamma_k = (s_history{end}' * y_history{end}) / (y_history{end}' * y_history{end});
            r = gamma_k * q;
        else
            r = q;
        end
        
        % 前向きループ
        for i = 1:length(s_history)
            beta = rho_history(i) * (y_history{i}' * r);
            r = r + s_history{i} * (alpha_values(i) - beta);
        end
        
        d = -r;
        
        % 直線探索
        [alpha, x_new] = armijo_line_search(f, grad_f, x, d);
        
        % 新しい勾配
        grad_new = grad_f(x_new);
        
        % s_k と y_k
        s = x_new - x;
        y = grad_new - grad;
        
        % 曲率条件のチェック
        sy = s' * y;
        if sy > 1e-12
            % 履歴の更新
            s_history{end+1} = s;
            y_history{end+1} = y;
            rho_history(end+1) = 1 / sy;
            
            % メモリ制限
            if length(s_history) > m
                s_history(1) = [];
                y_history(1) = [];
                rho_history(1) = [];
            end
        end
        
        % 更新
        x = x_new;
        grad = grad_new;
        
        % 履歴の更新
        history.x_values = [history.x_values, x];
        history.f_values = [history.f_values, f(x)];
        history.grad_norms = [history.grad_norms, norm(grad)];
        
        if mod(k, 10) == 0
            fprintf('反復 %d: f = %.6f, ||∇f|| = %.6f, メモリ: %d/%d\n', ...
                    k, f(x), norm(grad), length(s_history), m);
        end
    end
end

function val = getfield_default(s, field, default)
    if isfield(s, field)
        val = s.(field);
    else
        val = default;
    end
end

理論的性質

収束性

定理（準ニュートン法の超線形収束）: 以下の条件下で：

$f$ が強凸で二回連続微分可能
ヘッセ行列がリプシッツ連続
適切な直線探索の使用

BFGS法は超線形収束率を持ちます：

\lim_{k \to \infty} \frac{\|x_{k+1} - x^*\|}{\|x_k - x^*\|} = 0

Dennis-More条件

準ニュートン法が超線形収束するための必要十分条件：

\lim_{k \to \infty} \frac{\|(B_k - \nabla^2 f(x^*))s_k\|}{\|s_k\|} = 0

これは、ヘッセ行列近似の誤差が探索方向に沿って0に収束することを意味します。

正定性の保持

定理: BFGS更新において、 $B_k$ が正定値で曲率条件 $s_k^T y_k > 0$ が満たされるならば、 $B_{k+1}$ も正定値です。

これにより、BFGS法は常に降下方向を生成します。

L-BFGS（Limited-Memory BFGS）

メモリ効率

通常のBFGSは $O(n^2)$ のメモリを必要としますが、L-BFGSは $O(mn)$ のメモリで済みます（ $m \ll n$ ）。

二重ループアルゴリズム

Algorithm: L-BFGS Two-Loop Recursion
Input: 勾配 g_k, 履歴 {s_i, y_i, ρ_i}_{i=k-m}^{k-1}

q ← g_k
for i = k-1, k-2, ..., k-m do:
  α_i ← ρ_i (s_i^T q)
  q ← q - α_i y_i
r ← H_k^0 q  (初期スケーリング)
for i = k-m, k-m+1, ..., k-1 do:
  β ← ρ_i (y_i^T r)
  r ← r + s_i (α_i - β)
return -r  (探索方向)

初期スケーリング

L-BFGSでは、初期ヘッセ行列近似として以下がよく使用されます：

H_k^0 = \gamma_k I, \quad \gamma_k = \frac{s_{k-1}^T y_{k-1}}{y_{k-1}^T y_{k-1}}

実用的な考慮事項

曲率条件の維持

曲率条件 $s_k^T y_k > 0$ が満たされない場合の対処法：

スキップ: 更新を行わない
ダンピング: 修正された $y_k$ を使用
リセット: ヘッセ行列近似を単位行列にリセット

数値的安定性

def damped_bfgs_update(H, s, y, theta=0.2):
    """ダンピング付きBFGS更新"""
    sy = np.dot(s, y)
    Hy = H @ y
    sHy = np.dot(s, Hy)
    
    if sy < theta * sHy:
        # ダンピング
        phi = (1 - theta) * sHy / (sHy - sy)
        y_damped = phi * y + (1 - phi) * Hy
        sy_damped = np.dot(s, y_damped)
        
        # 修正されたy_dampedでBFGS更新
        rho = 1.0 / sy_damped
        I = np.eye(len(s))
        V1 = I - rho * np.outer(s, y_damped)
        V2 = I - rho * np.outer(y_damped, s)
        H_new = V1 @ H @ V2 + rho * np.outer(s, s)
        
        return H_new
    else:
        # 通常のBFGS更新
        rho = 1.0 / sy
        I = np.eye(len(s))
        V1 = I - rho * np.outer(s, y)
        V2 = I - rho * np.outer(y, s)
        H_new = V1 @ H @ V2 + rho * np.outer(s, s)
        
        return H_new

スケーリングの重要性

変数のスケールが大きく異なる場合、適切なスケーリングが重要です：

def scaled_quasi_newton(f, grad_f, x0, scaling_matrix=None):
    """スケーリング付き準ニュートン法"""
    
    if scaling_matrix is None:
        # 勾配ベースの自動スケーリング
        grad0 = grad_f(x0)
        scaling = np.diag(1.0 / (np.abs(grad0) + 1e-8))
    else:
        scaling = scaling_matrix
    
    # スケーリング変換
    def f_scaled(x_scaled):
        x_original = scaling @ x_scaled
        return f(x_original)
    
    def grad_f_scaled(x_scaled):
        x_original = scaling @ x_scaled
        grad_original = grad_f(x_original)
        return scaling.T @ grad_original
    
    x0_scaled = np.linalg.inv(scaling) @ x0
    
    # スケールされた問題を解く
    optimizer = QuasiNewtonOptimizer('bfgs')
    x_opt_scaled, history = optimizer.optimize(f_scaled, grad_f_scaled, x0_scaled)
    
    # 元のスケールに戻す
    x_opt = scaling @ x_opt_scaled
    
    return x_opt, history

まとめ

準ニュートン法の主な特徴：

超線形収束: ニュートン法に近い収束速度
計算効率: ヘッセ行列の計算・逆行列計算が不要
メモリ効率: L-BFGSにより大規模問題に対応
実用性: 多くの実問題で優秀な性能
堅牢性: 適切な実装により数値的に安定

手法選択の指針

中規模問題（n < 1000）: BFGS
大規模問題（n ≥ 1000）: L-BFGS
高精度が必要: 適切な直線探索との組み合わせ
メモリ制約: L-BFGSのメモリサイズ調整

成功の要因

適切な直線探索: Wolfe条件の使用
曲率条件の維持: 数値的安定性の確保
スケーリング: 変数の適切な正規化
初期化: 問題に応じた初期点選択

実装のポイント

準ニュートン法を実装する際は、曲率条件の監視、適切な直線探索の選択、そして数値的安定性のためのダンピング機構の実装が重要です。L-BFGSでは、メモリサイズ $m$ の調整も性能に大きく影響します。

参考文献

Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. Springer.
Dennis, J. E., & More, J. J. (1977). Quasi-Newton methods, motivation and theory. SIAM Review.
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming.
Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms. IMA Journal of Applied Mathematics.

概要​

基本概念​

ニュートン法の復習​

準ニュートン条件（Secant Condition）​

曲率条件​

主要な準ニュートン法​

DFP法（Davidon-Fletcher-Powell）​

BFGS法（Broyden-Fletcher-Goldfarb-Shanno）​

実装例​

Python実装​

MATLAB実装​

理論的性質​

収束性​

Dennis-More条件​

正定性の保持​

L-BFGS（Limited-Memory BFGS）​

メモリ効率​

二重ループアルゴリズム​

初期スケーリング​

実用的な考慮事項​

曲率条件の維持​

数値的安定性​

スケーリングの重要性​

まとめ​

手法選択の指針​

成功の要因​

参考文献​

概要