Skip to content

Quasi-newton methods

This subpackage contains conjugate gradient methods.

See also

  • Line search - conjugate gradient methods usually require a line search.

Classes:

  • ConjugateDescent

    Conjugate Descent (CD).

  • DYHS

    Dai-Yuan - Hestenes–Stiefel hybrid conjugate gradient method.

  • DaiYuan

    Dai–Yuan nonlinear conjugate gradient method.

  • FletcherReeves

    Fletcher–Reeves nonlinear conjugate gradient method.

  • HagerZhang

    Hager-Zhang nonlinear conjugate gradient method,

  • HestenesStiefel

    Hestenes–Stiefel nonlinear conjugate gradient method.

  • LiuStorey

    Liu-Storey nonlinear conjugate gradient method.

  • PolakRibiere

    Polak-Ribière-Polyak nonlinear conjugate gradient method.

  • ProjectedGradientMethod

    Projected gradient method. Directly projects the gradient onto subspace conjugate to past directions.

ConjugateDescent

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Conjugate Descent (CD).

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class ConjugateDescent(ConguateGradientBase):
    """Conjugate Descent (CD).

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
    """
    def __init__(self, restart_interval: int | None | Literal['auto'] = 'auto', clip_beta=False, inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def get_beta(self, p, g, prev_g, prev_d):
        return conjugate_descent_beta(g, prev_d, prev_g)

DYHS

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Dai-Yuan - Hestenes–Stiefel hybrid conjugate gradient method.

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class DYHS(ConguateGradientBase):
    """Dai-Yuan - Hestenes–Stiefel hybrid conjugate gradient method.

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
    """
    def __init__(self, restart_interval: int | None | Literal['auto'] = 'auto', clip_beta=False, inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def get_beta(self, p, g, prev_g, prev_d):
        return dyhs_beta(g, prev_d, prev_g)

DaiYuan

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Dai–Yuan nonlinear conjugate gradient method.

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1) after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class DaiYuan(ConguateGradientBase):
    """Dai–Yuan nonlinear conjugate gradient method.

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1)`` after this.
    """
    def __init__(self, restart_interval: int | None | Literal['auto'] = 'auto', clip_beta=False, inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def get_beta(self, p, g, prev_g, prev_d):
        return dai_yuan_beta(g, prev_d, prev_g)

FletcherReeves

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Fletcher–Reeves nonlinear conjugate gradient method.

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class FletcherReeves(ConguateGradientBase):
    """Fletcher–Reeves nonlinear conjugate gradient method.

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
    """
    def __init__(self, restart_interval: int | None | Literal['auto'] = 'auto', clip_beta=False, inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def initialize(self, p, g):
        self.global_state['prev_gg'] = g.dot(g)

    def get_beta(self, p, g, prev_g, prev_d):
        gg = g.dot(g)
        beta = fletcher_reeves_beta(gg, self.global_state['prev_gg'])
        self.global_state['prev_gg'] = gg
        return beta

HagerZhang

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Hager-Zhang nonlinear conjugate gradient method,

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class HagerZhang(ConguateGradientBase):
    """Hager-Zhang nonlinear conjugate gradient method,

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
    """
    def __init__(self, restart_interval: int | None | Literal['auto'] = 'auto', clip_beta=False, inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def get_beta(self, p, g, prev_g, prev_d):
        return hager_zhang_beta(g, prev_d, prev_g)

HestenesStiefel

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Hestenes–Stiefel nonlinear conjugate gradient method.

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class HestenesStiefel(ConguateGradientBase):
    """Hestenes–Stiefel nonlinear conjugate gradient method.

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
    """
    def __init__(self, restart_interval: int | None | Literal['auto'] = 'auto', clip_beta=False, inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def get_beta(self, p, g, prev_g, prev_d):
        return hestenes_stiefel_beta(g, prev_d, prev_g)

LiuStorey

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Liu-Storey nonlinear conjugate gradient method.

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class LiuStorey(ConguateGradientBase):
    """Liu-Storey nonlinear conjugate gradient method.

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
    """
    def __init__(self, restart_interval: int | None | Literal['auto'] = 'auto', clip_beta=False, inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def get_beta(self, p, g, prev_g, prev_d):
        return liu_storey_beta(g, prev_d, prev_g)

PolakRibiere

Bases: torchzero.modules.conjugate_gradient.cg.ConguateGradientBase

Polak-Ribière-Polyak nonlinear conjugate gradient method.

Note

This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.

Source code in torchzero/modules/conjugate_gradient/cg.py
class PolakRibiere(ConguateGradientBase):
    """Polak-Ribière-Polyak nonlinear conjugate gradient method.

    Note:
        This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
    """
    def __init__(self, clip_beta=True, restart_interval: int | None | Literal['auto'] = 'auto', inner: Chainable | None = None):
        super().__init__({}, clip_beta=clip_beta, restart_interval=restart_interval, inner=inner)

    def get_beta(self, p, g, prev_g, prev_d):
        return polak_ribiere_beta(g, prev_g)

ProjectedGradientMethod

Bases: torchzero.modules.quasi_newton.quasi_newton.HessianUpdateStrategy

Projected gradient method. Directly projects the gradient onto subspace conjugate to past directions.

Notes
  • This method uses N^2 memory.
  • This requires step size to be determined via a line search, so put a line search like tz.m.StrongWolfe(c2=0.1, a_init="first-order") after this.
  • This is not the same as projected gradient descent.
Reference

Pearson, J. D. (1969). Variable metric methods of minimisation. The Computer Journal, 12(2), 171–178. doi:10.1093/comjnl/12.2.171. (algorithm 5 in section 6)

Source code in torchzero/modules/conjugate_gradient/cg.py
class ProjectedGradientMethod(HessianUpdateStrategy): # this doesn't maintain hessian
    """Projected gradient method. Directly projects the gradient onto subspace conjugate to past directions.

    Notes:
        - This method uses N^2 memory.
        - This requires step size to be determined via a line search, so put a line search like ``tz.m.StrongWolfe(c2=0.1, a_init="first-order")`` after this.
        - This is not the same as projected gradient descent.

    Reference:
        Pearson, J. D. (1969). Variable metric methods of minimisation. The Computer Journal, 12(2), 171–178. doi:10.1093/comjnl/12.2.171.  (algorithm 5 in section 6)

    """

    def __init__(
        self,
        init_scale: float | Literal["auto"] = 1,
        tol: float = 1e-32,
        ptol: float | None = 1e-32,
        ptol_restart: bool = False,
        gtol: float | None = 1e-32,
        restart_interval: int | None | Literal['auto'] = 'auto',
        beta: float | None = None,
        update_freq: int = 1,
        scale_first: bool = False,
        concat_params: bool = True,
        # inverse: bool = True,
        inner: Chainable | None = None,
    ):
        super().__init__(
            defaults=None,
            init_scale=init_scale,
            tol=tol,
            ptol=ptol,
            ptol_restart=ptol_restart,
            gtol=gtol,
            restart_interval=restart_interval,
            beta=beta,
            update_freq=update_freq,
            scale_first=scale_first,
            concat_params=concat_params,
            inverse=True,
            inner=inner,
        )



    def update_H(self, H, s, y, p, g, p_prev, g_prev, state, setting):
        return projected_gradient_(H=H, y=y)