Skip to content

Higher order methods

This subpackage contains third and higher order methods.

See also

Classes:

  • HigherOrderNewton

    A basic arbitrary order newton's method with optional trust region and proximal penalty.

HigherOrderNewton

Bases: torchzero.core.module.Module

A basic arbitrary order newton's method with optional trust region and proximal penalty.

This constructs an nth order taylor approximation via autograd and minimizes it with scipy.optimize.minimize trust region newton solvers with optional proximal penalty.

The hessian of taylor approximation is easier to evaluate, plus it can be evaluated in a batched mode, so it can be more efficient in very specific instances.

Notes
  • In most cases HigherOrderNewton should be the first module in the chain because it relies on extra autograd. Use the inner argument if you wish to apply Newton preconditioning to another module's output.
  • This module requires the a closure passed to the optimizer step, as it needs to re-evaluate the loss and gradients for calculating higher order derivatives. The closure must accept a backward argument (refer to documentation).
  • this uses roughly O(N^order) memory and solving the subproblem is very expensive.
  • "none" and "proximal" trust methods may generate subproblems that have no minima, causing divergence.

Args:

order (int, optional):
    Order of the method, number of taylor series terms (orders of derivatives) used to approximate the function. Defaults to 4.
trust_method (str | None, optional):
    Method used for trust region.
    - "bounds" - the model is minimized within bounds defined by trust region.
    - "proximal" - the model is minimized with penalty for going too far from current point.
    - "none" - disables trust region.

    Defaults to 'bounds'.
increase (float, optional): trust region multiplier on good steps. Defaults to 1.5.
decrease (float, optional): trust region multiplier on bad steps. Defaults to 0.75.
trust_init (float | None, optional):
    initial trust region size. If none, defaults to 1 on :code:`trust_method="bounds"` and 0.1 on ``"proximal"``. Defaults to None.
trust_tol (float, optional):
    Maximum ratio of expected loss reduction to actual reduction for trust region increase.
    Should 1 or higer. Defaults to 2.
de_iters (int | None, optional):
    If this is specified, the model is minimized via differential evolution first to possibly escape local minima,
    then it is passed to scipy.optimize.minimize. Defaults to None.
vectorize (bool, optional): whether to enable vectorized jacobians (usually faster). Defaults to True.
Source code in torchzero/modules/higher_order/higher_order_newton.py
class HigherOrderNewton(Module):
    """A basic arbitrary order newton's method with optional trust region and proximal penalty.

    This constructs an nth order taylor approximation via autograd and minimizes it with
    ``scipy.optimize.minimize`` trust region newton solvers with optional proximal penalty.

    The hessian of taylor approximation is easier to evaluate, plus it can be evaluated in a batched mode,
    so it can be more efficient in very specific instances.

    Notes:
        - In most cases HigherOrderNewton should be the first module in the chain because it relies on extra autograd. Use the ``inner`` argument if you wish to apply Newton preconditioning to another module's output.
        - This module requires the a closure passed to the optimizer step, as it needs to re-evaluate the loss and gradients for calculating higher order derivatives. The closure must accept a ``backward`` argument (refer to documentation).
        - this uses roughly O(N^order) memory and solving the subproblem is very expensive.
        - "none" and "proximal" trust methods may generate subproblems that have no minima, causing divergence.

    Args:

        order (int, optional):
            Order of the method, number of taylor series terms (orders of derivatives) used to approximate the function. Defaults to 4.
        trust_method (str | None, optional):
            Method used for trust region.
            - "bounds" - the model is minimized within bounds defined by trust region.
            - "proximal" - the model is minimized with penalty for going too far from current point.
            - "none" - disables trust region.

            Defaults to 'bounds'.
        increase (float, optional): trust region multiplier on good steps. Defaults to 1.5.
        decrease (float, optional): trust region multiplier on bad steps. Defaults to 0.75.
        trust_init (float | None, optional):
            initial trust region size. If none, defaults to 1 on :code:`trust_method="bounds"` and 0.1 on ``"proximal"``. Defaults to None.
        trust_tol (float, optional):
            Maximum ratio of expected loss reduction to actual reduction for trust region increase.
            Should 1 or higer. Defaults to 2.
        de_iters (int | None, optional):
            If this is specified, the model is minimized via differential evolution first to possibly escape local minima,
            then it is passed to scipy.optimize.minimize. Defaults to None.
        vectorize (bool, optional): whether to enable vectorized jacobians (usually faster). Defaults to True.
    """
    def __init__(
        self,
        order: int = 4,
        trust_method: Literal['bounds', 'proximal', 'none'] | None = 'bounds',
        nplus: float = 3.5,
        nminus: float = 0.25,
        rho_good: float = 0.99,
        rho_bad: float = 1e-4,
        init: float | None = None,
        eta: float = 1e-6,
        max_attempts = 10,
        boundary_tol: float = 1e-2,
        de_iters: int | None = None,
        vectorize: bool = True,
    ):
        if init is None:
            if trust_method == 'bounds': init = 1
            else: init = 0.1

        defaults = dict(order=order, trust_method=trust_method, nplus=nplus, nminus=nminus, eta=eta, init=init, vectorize=vectorize, de_iters=de_iters, max_attempts=max_attempts, boundary_tol=boundary_tol, rho_good=rho_good, rho_bad=rho_bad)
        super().__init__(defaults)

    @torch.no_grad
    def step(self, var):
        params = TensorList(var.params)
        closure = var.closure
        if closure is None: raise RuntimeError('HigherOrderNewton requires closure')

        settings = self.settings[params[0]]
        order = settings['order']
        nplus = settings['nplus']
        nminus = settings['nminus']
        eta = settings['eta']
        init = settings['init']
        trust_method = settings['trust_method']
        de_iters = settings['de_iters']
        max_attempts = settings['max_attempts']
        vectorize = settings['vectorize']
        boundary_tol = settings['boundary_tol']
        rho_good = settings['rho_good']
        rho_bad = settings['rho_bad']

        # ------------------------ calculate grad and hessian ------------------------ #
        with torch.enable_grad():
            loss = var.loss = var.loss_approx = closure(False)

            g_list = torch.autograd.grad(loss, params, create_graph=True)
            var.grad = list(g_list)

            g = torch.cat([t.ravel() for t in g_list])
            n = g.numel()
            derivatives = [g]
            T = g # current derivatives tensor

            # get all derivative up to order
            for o in range(2, order + 1):
                is_last = o == order
                T_list = jacobian_wrt([T], params, create_graph=not is_last, batched=vectorize)
                with torch.no_grad() if is_last else nullcontext():
                    # the shape is (ndim, ) * order
                    T = flatten_jacobian(T_list).view(n, n, *T.shape[1:])
                    derivatives.append(T)

        x0 = torch.cat([p.ravel() for p in params])

        success = False
        x_star = None
        while not success:
            max_attempts -= 1
            if max_attempts < 0: break

            # load trust region value
            trust_value = self.global_state.get('trust_region', init)

            # make sure its not too small or too large
            finfo = torch.finfo(x0.dtype)
            if trust_value < finfo.tiny*2 or trust_value > finfo.max / (2*nplus):
                trust_value = self.global_state['trust_region'] = settings['init']

            # determine tr and prox values
            if trust_method is None: trust_method = 'none'
            else: trust_method = trust_method.lower()

            if trust_method == 'none':
                trust_region = None
                prox = 0

            elif trust_method == 'bounds':
                trust_region = trust_value
                prox = 0

            elif trust_method == 'proximal':
                trust_region = None
                prox = 1 / trust_value

            else:
                raise ValueError(trust_method)

            # minimize the model
            x_star, expected_loss = _poly_minimize(
                trust_region=trust_region,
                prox=prox,
                de_iters=de_iters,
                c=loss.item(),
                x=x0,
                derivatives=derivatives,
            )

            # update trust region
            if trust_method == 'none':
                success = True
            else:
                pred_reduction = loss - expected_loss

                vec_to_tensors_(x_star, params)
                loss_star = closure(False)
                vec_to_tensors_(x0, params)
                reduction = loss - loss_star

                rho = reduction / (max(pred_reduction, 1e-8))
                # failed step
                if rho < rho_bad:
                    self.global_state['trust_region'] = trust_value * nminus

                # very good step
                elif rho > rho_good:
                    step = (x_star - x0)
                    magn = torch.linalg.vector_norm(step) # pylint:disable=not-callable
                    if trust_method == 'proximal' or (trust_value - magn) / trust_value <= boundary_tol:
                        # close to boundary
                        self.global_state['trust_region'] = trust_value * nplus

                # if the ratio is high enough then accept the proposed step
                success = rho > eta

        assert x_star is not None
        if success:
            difference = vec_to_tensors(x0 - x_star, params)
            var.update = list(difference)
        else:
            var.update = params.zeros_like()
        return var