Second order¶
This subpackage contains "True" second order methods that use exact second order information.
See also¶
- Quasi-newton - quasi-newton methods that estimate the hessian using gradient information.
Classes:
-
ImprovedNewton
–Improved Newton's Method (INM).
-
InverseFreeNewton
–Inverse-free newton's method
-
Newton
–Exact Newton's method via autograd.
-
NewtonCG
–Newton's method with a matrix-free conjugate gradient or minimial-residual solver.
-
NewtonCGSteihaug
–Newton's method with trust region and a matrix-free Steihaug-Toint conjugate gradient solver.
-
NystromPCG
–Newton's method with a Nyström-preconditioned conjugate gradient solver.
-
NystromSketchAndSolve
–Newton's method with a Nyström sketch-and-solve solver.
-
SixthOrder3P
–Sixth-order iterative method.
-
SixthOrder3PM2
–Wang, Xiaofeng, and Yang Li. "An efficient sixth-order Newton-type method for solving nonlinear systems." Algorithms 10.2 (2017): 45.
-
SixthOrder5P
–Argyros, Ioannis K., et al. "Extended convergence for two sixth order methods under the same weak conditions." Foundations 3.1 (2023): 127-139.
-
SubspaceNewton
–Subspace Newton. Performs a Newton step in a subspace (random or spanned by past gradients).
-
TwoPointNewton
–two-point Newton method with frozen derivative with third order convergence.
ImprovedNewton ¶
Bases: torchzero.core.transform.Transform
Improved Newton's Method (INM).
Source code in torchzero/modules/second_order/inm.py
InverseFreeNewton ¶
Bases: torchzero.core.transform.Transform
Inverse-free newton's method
Source code in torchzero/modules/second_order/ifn.py
Newton ¶
Bases: torchzero.core.transform.Transform
Exact Newton's method via autograd.
Newton's method produces a direction jumping to the stationary point of quadratic approximation of the target function.
The update rule is given by (H + yI)⁻¹g
, where H
is the hessian and g
is the gradient, y
is the damping
parameter.
g
can be output of another module, if it is specifed in inner
argument.
Note
In most cases Newton should be the first module in the chain because it relies on autograd. Use the inner
argument if you wish to apply Newton preconditioning to another module's output.
Note
This module requires the a closure passed to the optimizer step,
as it needs to re-evaluate the loss and gradients for calculating the hessian.
The closure must accept a backward
argument (refer to documentation).
Parameters:
-
damping
(float
, default:0
) –tikhonov regularizer value. Defaults to 0.
-
eigval_fn
(Callable | None
, default:None
) –function to apply to eigenvalues, for example
torch.abs
orlambda L: torch.clip(L, min=1e-8)
. If this is specified, eigendecomposition will be used to invert the hessian. -
update_freq
(int
, default:1
) –updates hessian every
update_freq
steps. -
precompute_inverse
(bool
, default:None
) –if
True
, whenever hessian is computed, also computes the inverse. This is more efficient whenupdate_freq
is large. IfNone
, this isTrue
ifupdate_freq >= 10
. -
use_lstsq
((bool, Optional)
, default:False
) –if True, least squares will be used to solve the linear system, this can prevent it from exploding when hessian is indefinite. If False, tries cholesky, if it fails tries LU, and then least squares. If
eigval_fn
is specified, eigendecomposition is always used and this argument is ignored. -
hessian_method
(str
, default:'batched_autograd'
) –Determines how hessian is computed.
"batched_autograd"
- uses autograd to computendim
batched hessian-vector products. Faster than"autograd"
but uses more memory."autograd"
- uses autograd to computendim
hessian-vector products using for loop. Slower than"batched_autograd"
but uses less memory."functional_revrev"
- usestorch.autograd.functional
with "reverse-over-reverse" strategy and a for-loop. This is generally equivalent to"autograd"
."functional_fwdrev"
- usestorch.autograd.functional
with vectorized "forward-over-reverse" strategy. Faster than"functional_fwdrev"
but uses more memory ("batched_autograd"
seems to be faster)"func"
- usestorch.func.hessian
which uses "forward-over-reverse" strategy. This method is the fastest and is recommended, however it is more restrictive and fails with some operators which is why it isn't the default."gfd_forward"
- computesndim
hessian-vector products via gradient finite difference using a less accurate forward formula which requires one extra gradient evaluation per hessian-vector product."gfd_central"
- computesndim
hessian-vector products via gradient finite difference using a more accurate central formula which requires two gradient evaluations per hessian-vector product."fd"
- uses function values to estimate gradient and hessian via finite difference. This uses less evaluations than chaining"gfd_*"
aftertz.m.FDM
."thoad"
- usesthoad
library, can be significantly faster than pytorch but limited operator coverage.
Defaults to
"batched_autograd"
. -
h
(float
, default:0.001
) –finite difference step size if hessian is compute via finite-difference.
-
inner
(Chainable | None
, default:None
) –modules to apply hessian preconditioner to. Defaults to None.
See also¶
tz.m.NewtonCG
: uses a matrix-free conjugate gradient solver and hessian-vector products. useful for large scale problems as it doesn't form the full hessian.tz.m.NewtonCGSteihaug
: trust region version oftz.m.NewtonCG
.tz.m.ImprovedNewton
: Newton with additional rank one correction to the hessian, can be faster than Newton.tz.m.InverseFreeNewton
: an inverse-free variant of Newton's method.tz.m.quasi_newton
: large collection of quasi-newton methods that estimate the hessian.
Notes¶
Implementation details¶
(H + yI)⁻¹g
is calculated by solving the linear system (H + yI)x = g
.
The linear system is solved via cholesky decomposition, if that fails, LU decomposition, and if that fails, least squares. Least squares can be forced by setting use_lstsq=True
.
Additionally, if eigval_fn
is specified, eigendecomposition of the hessian is computed,
eigval_fn
is applied to the eigenvalues, and (H + yI)⁻¹
is computed using the computed eigenvectors and transformed eigenvalues. This is more generally more computationally expensive but not by much.
Handling non-convexity¶
Standard Newton's method does not handle non-convexity well without some modifications. This is because it jumps to the stationary point, which may be the maxima of the quadratic approximation.
A modification to handle non-convexity is to modify the eignevalues to be positive,
for example by setting eigval_fn = lambda L: L.abs().clip(min=1e-4)
.
Examples:¶
Newton's method with backtracking line search
Newton's method for non-convex optimization.
opt = tz.Optimizer(
model.parameters(),
tz.m.Newton(eigval_fn = lambda L: L.abs().clip(min=1e-4)),
tz.m.Backtracking()
)
Newton preconditioning applied to momentum
Source code in torchzero/modules/second_order/newton.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
|
NewtonCG ¶
Bases: torchzero.core.transform.Transform
Newton's method with a matrix-free conjugate gradient or minimial-residual solver.
Notes
-
In most cases NewtonCGSteihaug should be the first module in the chain because it relies on autograd. Use the
inner
argument if you wish to apply Newton preconditioning to another module's output. -
This module requires the a closure passed to the optimizer step, as it needs to re-evaluate the loss and gradients for calculating HVPs. The closure must accept a
backward
argument (refer to documentation).
Warning
CG may fail if hessian is not positive-definite.
Parameters:
-
maxiter
(int | None
, default:None
) –Maximum number of iterations for the conjugate gradient solver. By default, this is set to the number of dimensions in the objective function, which is the theoretical upper bound for CG convergence. Setting this to a smaller value (truncated Newton) can still generate good search directions. Defaults to None.
-
tol
(float
, default:1e-08
) –Relative tolerance for the conjugate gradient solver to determine convergence. Defaults to 1e-4.
-
reg
(float
, default:1e-08
) –Regularization parameter (damping) added to the Hessian diagonal. This helps ensure the system is positive-definite. Defaults to 1e-8.
-
hvp_method
(str
, default:'autograd'
) –Determines how Hessian-vector products are evaluated.
"autograd"
- uses autograd hessian-vector products. If multiple hessian-vector products are evaluated, uses a for-loop."fd_forward"
- uses gradient finite difference approximation with a less accurate forward formula which requires one extra gradient evaluation per hessian-vector product."fd_central"
- uses gradient finite difference approximation with a more accurate central formula which requires two gradient evaluations per hessian-vector product.
For NewtonCG
"batched_autograd"
is equivalent to"autograd"
. Defaults to"autograd"
. -
h
(float
, default:0.001
) –The step size for finite difference if
hvp_method
is"fd_forward"
or"fd_central"
. Defaults to 1e-3. -
warm_start
(bool
, default:False
) –If
True
, the conjugate gradient solver is initialized with the solution from the previous optimization step. This can accelerate convergence, especially in truncated Newton methods. Defaults to False. -
inner
(Chainable | None
, default:None
) –NewtonCG will attempt to apply preconditioning to the output of this module.
Examples: Newton-CG with a backtracking line search:
Truncated Newton method (useful for large-scale problems):
Source code in torchzero/modules/second_order/newton_cg.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
NewtonCGSteihaug ¶
Bases: torchzero.core.transform.Transform
Newton's method with trust region and a matrix-free Steihaug-Toint conjugate gradient solver.
Notes
-
In most cases NewtonCGSteihaug should be the first module in the chain because it relies on autograd. Use the
inner
argument if you wish to apply Newton preconditioning to another module's output. -
This module requires the a closure passed to the optimizer step, as it needs to re-evaluate the loss and gradients for calculating HVPs. The closure must accept a
backward
argument (refer to documentation).
Parameters:
-
eta
(float
, default:0.0
) –if ratio of actual to predicted rediction is larger than this, step is accepted. Defaults to 0.0.
-
nplus
(float
, default:3.5
) –increase factor on successful steps. Defaults to 1.5.
-
nminus
(float
, default:0.25
) –decrease factor on unsuccessful steps. Defaults to 0.75.
-
rho_good
(float
, default:0.99
) –if ratio of actual to predicted rediction is larger than this, trust region size is multiplied by
nplus
. -
rho_bad
(float
, default:0.0001
) –if ratio of actual to predicted rediction is less than this, trust region size is multiplied by
nminus
. -
init
(float
, default:1
) –Initial trust region value. Defaults to 1.
-
max_attempts
(max_attempts
, default:100
) –maximum number of trust radius reductions per step. A zero update vector is returned when this limit is exceeded. Defaults to 10.
-
max_history
(int
, default:100
) –CG will store this many intermediate solutions, reusing them when trust radius is reduced instead of re-running CG. Each solution storage requires 2N memory. Defaults to 100.
-
boundary_tol
(float | None
, default:1e-06
) –The trust region only increases when suggested step's norm is at least
(1-boundary_tol)*trust_region
. This prevents increasing trust region when solution is not on the boundary. Defaults to 1e-2. -
maxiter
(int | None
, default:None
) –maximum number of CG iterations per step. Each iteration requies one backward pass if
hvp_method="forward"
, two otherwise. Defaults to None. -
miniter
(int
, default:1
) –minimal number of CG iterations. This prevents making no progress
-
tol
(float
, default:1e-08
) –terminates CG when norm of the residual is less than this value. Defaults to 1e-8. when initial guess is below tolerance. Defaults to 1.
-
reg
(float
, default:1e-08
) –hessian regularization. Defaults to 1e-8.
-
solver
(str
, default:'cg'
) –solver, "cg" or "minres". "cg" is recommended. Defaults to 'cg'.
-
adapt_tol
(bool
, default:True
) –if True, whenever trust radius collapses to smallest representable number, the tolerance is multiplied by 0.1. Defaults to True.
-
npc_terminate
(bool
, default:False
) –whether to terminate CG/MINRES whenever negative curvature is detected. Defaults to False.
-
hvp_method
(str
, default:'fd_central'
) –either
"fd_forward"
to use forward formula which requires one backward pass per hessian-vector product, or"fd_central"
to use a more accurate central formula which requires two backward passes."fd_forward"
is usually accurate enough. Defaults to"fd_forward"
. -
h
(float
, default:0.001
) –finite difference step size. Defaults to 1e-3.
-
inner
(Chainable | None
, default:None
) –applies preconditioning to output of this module. Defaults to None.
Examples:¶
Trust-region Newton-CG:
Reference:¶
Steihaug, Trond. "The conjugate gradient method and trust regions in large scale optimization." SIAM Journal on Numerical Analysis 20.3 (1983): 626-637.
Source code in torchzero/modules/second_order/newton_cg.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 |
|
NystromPCG ¶
Bases: torchzero.core.transform.Transform
Newton's method with a Nyström-preconditioned conjugate gradient solver.
Notes
-
This module requires the a closure passed to the optimizer step, as it needs to re-evaluate the loss and gradients for calculating HVPs. The closure must accept a
backward
argument (refer to documentation). -
In most cases NystromPCG should be the first module in the chain because it relies on autograd. Use the
inner
argument if you wish to apply Newton preconditioning to another module's output.
Parameters:
-
rank
(int
) –size of the sketch for preconditioning, this many hessian-vector products will be evaluated before running the conjugate gradient solver. Larger value improves the preconditioning and speeds up conjugate gradient.
-
maxiter
(int | None
, default:None
) –maximum number of iterations. By default this is set to the number of dimensions in the objective function, which is supposed to be enough for conjugate gradient to have guaranteed convergence. Setting this to a small value can still generate good enough directions. Defaults to None.
-
tol
(float
, default:1e-08
) –relative tolerance for conjugate gradient solver. Defaults to 1e-4.
-
reg
(float
, default:1e-06
) –regularization parameter. Defaults to 1e-8.
-
hvp_method
(str
, default:'batched_autograd'
) –Determines how Hessian-vector products are computed.
"batched_autograd"
- uses autograd with batched hessian-vector products to compute the preconditioner. Faster than"autograd"
but uses more memory."autograd"
- uses autograd hessian-vector products, uses a for loop to compute the preconditioner. Slower than"batched_autograd"
but uses less memory."fd_forward"
- uses gradient finite difference approximation with a less accurate forward formula which requires one extra gradient evaluation per hessian-vector product."fd_central"
- uses gradient finite difference approximation with a more accurate central formula which requires two gradient evaluations per hessian-vector product.
Defaults to
"autograd"
. -
h
(float
, default:0.001
) –The step size for finite difference if
hvp_method
is"fd_forward"
or"fd_central"
. Defaults to 1e-3. -
inner
(Chainable | None
, default:None
) –modules to apply hessian preconditioner to. Defaults to None.
-
seed
(int | None
, default:None
) –seed for random generator. Defaults to None.
Examples:
NystromPCG with backtracking line search
Reference
Frangella, Z., Tropp, J. A., & Udell, M. (2023). Randomized nyström preconditioning. SIAM Journal on Matrix Analysis and Applications, 44(2), 718-752. https://arxiv.org/abs/2110.02820
Source code in torchzero/modules/second_order/nystrom.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 |
|
NystromSketchAndSolve ¶
Bases: torchzero.core.transform.Transform
Newton's method with a Nyström sketch-and-solve solver.
Notes
-
This module requires the a closure passed to the optimizer step, as it needs to re-evaluate the loss and gradients for calculating HVPs. The closure must accept a
backward
argument (refer to documentation). -
In most cases NystromSketchAndSolve should be the first module in the chain because it relies on autograd. Use the
inner
argument if you wish to apply Newton preconditioning to another module's output. -
If this is unstable, increase the
reg
parameter and tune the rank.
Parameters:
-
rank
(int
) –size of the sketch, this many hessian-vector products will be evaluated per step.
-
reg
(float | None
, default:0.01
) –scale of identity matrix added to hessian. Note that if this is specified, nystrom sketch-and-solve is used to compute
(Q diag(L) Q.T + reg*I)x = b
. It is very unstable whenreg
is small, i.e. smaller than 1e-4. If this is None,(Q diag(L) Q.T)x = b
is computed by simply taking reciprocal of eigenvalues. Defaults to 1e-3. -
eigv_tol
(float
, default:0
) –all eigenvalues smaller than largest eigenvalue times
eigv_tol
are removed. Defaults to None. -
truncate
(int | None
, default:None
) –keeps top
truncate
eigenvalues. Defaults to None. -
damping
(float
, default:0
) –scalar added to eigenvalues. Defaults to 0.
-
rdamping
(float
, default:0
) –scalar multiplied by largest eigenvalue and added to eigenvalues. Defaults to 0.
-
update_freq
(int
, default:1
) –frequency of updating preconditioner. Defaults to 1.
-
hvp_method
(str
, default:'batched_autograd'
) –Determines how Hessian-vector products are computed.
"batched_autograd"
- uses autograd with batched hessian-vector products to compute the preconditioner. Faster than"autograd"
but uses more memory."autograd"
- uses autograd hessian-vector products, uses a for loop to compute the preconditioner. Slower than"batched_autograd"
but uses less memory."fd_forward"
- uses gradient finite difference approximation with a less accurate forward formula which requires one extra gradient evaluation per hessian-vector product."fd_central"
- uses gradient finite difference approximation with a more accurate central formula which requires two gradient evaluations per hessian-vector product.
Defaults to
"autograd"
. -
h
(float
, default:0.001
) –The step size for finite difference if
hvp_method
is"fd_forward"
or"fd_central"
. Defaults to 1e-3. -
inner
(Chainable | None
, default:None
) –modules to apply hessian preconditioner to. Defaults to None.
-
seed
(int | None
, default:None
) –seed for random generator. Defaults to None.
Examples: NystromSketchAndSolve with backtracking line search
Trust region NystromSketchAndSolve
References: - Frangella, Z., Rathore, P., Zhao, S., & Udell, M. (2024). SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates. SIAM Journal on Mathematics of Data Science, 6(4), 1173-1204. - Frangella, Z., Tropp, J. A., & Udell, M. (2023). Randomized nyström preconditioning. SIAM Journal on Matrix Analysis and Applications, 44(2), 718-752
Source code in torchzero/modules/second_order/nystrom.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
SixthOrder3P ¶
Bases: torchzero.modules.second_order.multipoint.HigherOrderMethodBase
Sixth-order iterative method.
Abro, Hameer Akhtar, and Muhammad Mujtaba Shaikh. "A new time-efficient and convergent nonlinear solver." Applied Mathematics and Computation 355 (2019): 516-536.
Source code in torchzero/modules/second_order/multipoint.py
SixthOrder3PM2 ¶
Bases: torchzero.modules.second_order.multipoint.HigherOrderMethodBase
Wang, Xiaofeng, and Yang Li. "An efficient sixth-order Newton-type method for solving nonlinear systems." Algorithms 10.2 (2017): 45.
Source code in torchzero/modules/second_order/multipoint.py
SixthOrder5P ¶
Bases: torchzero.modules.second_order.multipoint.HigherOrderMethodBase
Argyros, Ioannis K., et al. "Extended convergence for two sixth order methods under the same weak conditions." Foundations 3.1 (2023): 127-139.
Source code in torchzero/modules/second_order/multipoint.py
SubspaceNewton ¶
Bases: torchzero.core.transform.Transform
Subspace Newton. Performs a Newton step in a subspace (random or spanned by past gradients).
Parameters:
-
sketch_size
(int
) –size of the random sketch. This many hessian-vector products will need to be evaluated each step.
-
sketch_type
(str
, default:'common_directions'
) –- "common_directions" - uses history steepest descent directions as the basis[2]. It is orthonormalized on-line using Gram-Schmidt (default).
- "orthonormal" - random orthonormal basis. Orthonormality is necessary to use linear operator based modules such as trust region, but it can be slower to compute.
- "rows" - samples random rows.
- "topk" - samples top-rank rows with largest gradient magnitude.
- "rademacher" - approximately orthonormal (if dimension is large) scaled random rademacher basis.
- "mixed" - random orthonormal basis but with four directions set to gradient, slow and fast gradient EMAs, and previous update direction.
-
damping
(float
, default:0
) –hessian damping (scale of identity matrix added to hessian). Defaults to 0.
-
hvp_method
(str
, default:'batched_autograd'
) –How to compute hessian-matrix product: - "batched_autograd" - uses batched autograd - "autograd" - uses unbatched autograd - "forward" - uses finite difference with forward formula, performing 1 backward pass per Hvp. - "central" - uses finite difference with a more accurate central formula, performing 2 backward passes per Hvp.
. Defaults to "batched_autograd".
-
h
(float
, default:0.01
) –finite difference step size. Defaults to 1e-2.
-
use_lstsq
(bool
, default:True
) –whether to use least squares to solve
Hx=g
. Defaults to False. -
update_freq
(int
, default:1
) –frequency of updating the hessian. Defaults to 1.
-
H_tfm
(Callable | None
) –optional hessian transforms, takes in two arguments -
(hessian, gradient)
.must return either a tuple:
(hessian, is_inverted)
with transformed hessian and a boolean value which must be True if transform inverted the hessian and False otherwise.Or it returns a single tensor which is used as the update.
Defaults to None.
-
eigval_fn
(Callable | None
, default:None
) –optional eigenvalues transform, for example
torch.abs
orlambda L: torch.clip(L, min=1e-8)
. If this is specified, eigendecomposition will be used to invert the hessian. -
seed
(int | None
, default:None
) –seed for random generator. Defaults to None.
-
inner
(Chainable | None
, default:None
) –preconditions output of this module. Defaults to None.
Examples¶
RSN with line search
RSN with trust region
References
- Gower, Robert, et al. "RSN: randomized subspace Newton." Advances in Neural Information Processing Systems 32 (2019).
- Wang, Po-Wei, Ching-pei Lee, and Chih-Jen Lin. "The common-directions method for regularized empirical risk minimization." Journal of Machine Learning Research 20.58 (2019): 1-49.
Source code in torchzero/modules/second_order/rsn.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|
TwoPointNewton ¶
Bases: torchzero.modules.second_order.multipoint.HigherOrderMethodBase
two-point Newton method with frozen derivative with third order convergence.
Sharma, Janak Raj, and Deepak Kumar. "A fast and efficient composite Newton–Chebyshev method for systems of nonlinear equations." Journal of Complexity 49 (2018): 56-73.