Quasi-newton methods¶
This subpackage contains Quasi-newton methods that estimate the hessian using gradient information.
See also¶
- Conjugate gradient - computationally cheaper alternative to quasi-newton methods.
- Second order - "true" second order methods.
- Line search - quasi-newton methods usually require either a line search or a trust region.
- Trust region - trust regions can use hessian estimated by any of the quasi-newton methods.
Classes:
-
BFGS
–Broyden–Fletcher–Goldfarb–Shanno Quasi-Newton method. This is usually the most stable quasi-newton method.
-
BroydenBad
–Broyden's "bad" Quasi-Newton method.
-
BroydenGood
–Broyden's "good" Quasi-Newton method.
-
DFP
–Davidon–Fletcher–Powell Quasi-Newton method.
-
DNRTR
–Diagonal quasi-newton method.
-
DiagonalBFGS
–Diagonal BFGS. This is simply BFGS with only the diagonal being updated and used. It doesn't satisfy the secant equation but may still be useful.
-
DiagonalQuasiCauchi
–Diagonal quasi-cauchi method.
-
DiagonalSR1
–Diagonal SR1. This is simply SR1 with only the diagonal being updated and used. It doesn't satisfy the secant equation but may still be useful.
-
DiagonalWeightedQuasiCauchi
–Diagonal quasi-cauchi method.
-
FletcherVMM
–Fletcher's variable metric Quasi-Newton method.
-
GradientCorrection
–Estimates gradient at minima along search direction assuming function is quadratic.
-
Greenstadt1
–Greenstadt's first Quasi-Newton method.
-
Greenstadt2
–Greenstadt's second Quasi-Newton method.
-
Horisho
–Horisho's variable metric Quasi-Newton method.
-
ICUM
–Inverse Column-updating Quasi-Newton method. This is computationally cheaper than other Quasi-Newton methods
-
LBFGS
–Limited-memory BFGS algorithm. A line search or trust region is recommended.
-
LSR1
–Limited-memory SR1 algorithm. A line search or trust region is recommended.
-
McCormick
–McCormicks's Quasi-Newton method.
-
NewDQN
–Diagonal quasi-newton method.
-
NewSSM
–Self-scaling Quasi-Newton method.
-
PSB
–Powell's Symmetric Broyden Quasi-Newton method.
-
Pearson
–Pearson's Quasi-Newton method.
-
ProjectedNewtonRaphson
–Projected Newton Raphson method.
-
SR1
–Symmetric Rank 1. This works best with a trust region:
-
SSVM
–Self-scaling variable metric Quasi-Newton method.
-
ShorR
–Shor’s r-algorithm.
-
ThomasOptimalMethod
–Thomas's "optimal" Quasi-Newton method.
BFGS ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Broyden–Fletcher–Goldfarb–Shanno Quasi-Newton method. This is usually the most stable quasi-newton method.
Note
a line search or a trust region is recommended
Warning
this uses at least O(N^2) memory.
Parameters:
-
init_scale
(float | Literal['auto']
, default:'auto'
) –initial hessian matrix is set to identity times this.
"auto" corresponds to a heuristic from Nocedal. Stephen J. Wright. Numerical Optimization p.142-143.
Defaults to "auto".
-
tol
(float
, default:1e-32
) –tolerance on curvature condition. Defaults to 1e-32.
-
ptol
(float | None
, default:1e-32
) –skips update if maximum difference between current and previous gradients is less than this, to avoid instability. Defaults to 1e-32.
-
ptol_restart
(bool
, default:False
) –whether to reset the hessian approximation when ptol tolerance is not met. Defaults to False.
-
restart_interval
(int | None | Literal['auto']
, default:None
) –interval between resetting the hessian approximation.
"auto" corresponds to number of decision variables + 1.
None - no resets.
Defaults to None.
-
beta
(float | None
, default:None
) –momentum on H or B. Defaults to None.
-
update_freq
(int
, default:1
) –frequency of updating H or B. Defaults to 1.
-
scale_first
(bool
, default:False
) –whether to downscale first step before hessian approximation becomes available. Defaults to True.
-
scale_second
(bool
) –whether to downscale second step. Defaults to False.
-
concat_params
(bool
, default:True
) –If true, all parameters are treated as a single vector. If False, the update rule is applied to each parameter separately. Defaults to True.
-
inner
(Chainable | None
, default:None
) –preconditioning is applied to the output of this module. Defaults to None.
Examples:¶
BFGS with backtracking line search:
BFGS with trust region
Source code in torchzero/modules/quasi_newton/quasi_newton.py
BroydenBad ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Broyden's "bad" Quasi-Newton method.
Note
a trust region or an accurate line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Spedicato, E., & Huang, Z. (1997). Numerical experience with newton-like methods for nonlinear algebraic systems. Computing, 58(1), 69–89. doi:10.1007/bf02684472
Source code in torchzero/modules/quasi_newton/quasi_newton.py
BroydenGood ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Broyden's "good" Quasi-Newton method.
Note
a trust region or an accurate line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Spedicato, E., & Huang, Z. (1997). Numerical experience with newton-like methods for nonlinear algebraic systems. Computing, 58(1), 69–89. doi:10.1007/bf02684472
Source code in torchzero/modules/quasi_newton/quasi_newton.py
DFP ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Davidon–Fletcher–Powell Quasi-Newton method.
Note
a trust region or an accurate line search is recommended.
Warning
this uses at least O(N^2) memory.
Source code in torchzero/modules/quasi_newton/quasi_newton.py
DNRTR ¶
Bases: torchzero.modules.quasi_newton.quasi_newton.HessianUpdateStrategy
Diagonal quasi-newton method.
Reference
Andrei, Neculai. "A diagonal quasi-Newton updating method for unconstrained optimization." Numerical Algorithms 81.2 (2019): 575-590.
Source code in torchzero/modules/quasi_newton/diagonal_quasi_newton.py
DiagonalBFGS ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Diagonal BFGS. This is simply BFGS with only the diagonal being updated and used. It doesn't satisfy the secant equation but may still be useful.
Source code in torchzero/modules/quasi_newton/diagonal_quasi_newton.py
DiagonalQuasiCauchi ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._HessianUpdateStrategyDefaults
Diagonal quasi-cauchi method.
Reference
Zhu M., Nazareth J. L., Wolkowicz H. The quasi-Cauchy relation and diagonal updating //SIAM Journal on Optimization. – 1999. – Т. 9. – №. 4. – С. 1192-1204.
Source code in torchzero/modules/quasi_newton/diagonal_quasi_newton.py
DiagonalSR1 ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Diagonal SR1. This is simply SR1 with only the diagonal being updated and used. It doesn't satisfy the secant equation but may still be useful.
Source code in torchzero/modules/quasi_newton/diagonal_quasi_newton.py
DiagonalWeightedQuasiCauchi ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._HessianUpdateStrategyDefaults
Diagonal quasi-cauchi method.
Reference
Leong, Wah June, Sharareh Enshaei, and Sie Long Kek. "Diagonal quasi-Newton methods via least change updating principle with weighted Frobenius norm." Numerical Algorithms 86 (2021): 1225-1241.
Source code in torchzero/modules/quasi_newton/diagonal_quasi_newton.py
FletcherVMM ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Fletcher's variable metric Quasi-Newton method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322. doi:10.1093/comjnl/13.3.317
Source code in torchzero/modules/quasi_newton/quasi_newton.py
GradientCorrection ¶
Bases: torchzero.core.transform.Transform
Estimates gradient at minima along search direction assuming function is quadratic.
This can useful as inner module for second order methods with inexact line search.
Example:¶
L-BFGS with gradient correction
opt = tz.Modular(
model.parameters(),
tz.m.LBFGS(inner=tz.m.GradientCorrection()),
tz.m.Backtracking()
)
Reference
HOSHINO, S. (1972). A Formulation of Variable Metric Methods. IMA Journal of Applied Mathematics, 10(3), 394–403. doi:10.1093/imamat/10.3.394
Source code in torchzero/modules/quasi_newton/quasi_newton.py
Greenstadt1 ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Greenstadt's first Quasi-Newton method.
Note
a trust region or an accurate line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Spedicato, E., & Huang, Z. (1997). Numerical experience with newton-like methods for nonlinear algebraic systems. Computing, 58(1), 69–89. doi:10.1007/bf02684472
Source code in torchzero/modules/quasi_newton/quasi_newton.py
Greenstadt2 ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Greenstadt's second Quasi-Newton method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Spedicato, E., & Huang, Z. (1997). Numerical experience with newton-like methods for nonlinear algebraic systems. Computing, 58(1), 69–89. doi:10.1007/bf02684472
Source code in torchzero/modules/quasi_newton/quasi_newton.py
Horisho ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Horisho's variable metric Quasi-Newton method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
HOSHINO, S. (1972). A Formulation of Variable Metric Methods. IMA Journal of Applied Mathematics, 10(3), 394–403. doi:10.1093/imamat/10.3.394
Source code in torchzero/modules/quasi_newton/quasi_newton.py
ICUM ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Inverse Column-updating Quasi-Newton method. This is computationally cheaper than other Quasi-Newton methods due to only updating one column of the inverse hessian approximation per step.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Lopes, V. L., & Martínez, J. M. (1995). Convergence properties of the inverse column-updating method. Optimization Methods & Software, 6(2), 127–144. from https://www.ime.unicamp.br/sites/default/files/pesquisa/relatorios/rp-1993-76.pdf
Source code in torchzero/modules/quasi_newton/quasi_newton.py
LBFGS ¶
Bases: torchzero.core.transform.Transform
Limited-memory BFGS algorithm. A line search or trust region is recommended.
Parameters:
-
history_size
(int
, default:10
) –number of past parameter differences and gradient differences to store. Defaults to 10.
-
ptol
(float | None
, default:1e-32
) –skips updating the history if maximum absolute value of parameter difference is less than this value. Defaults to 1e-10.
-
ptol_restart
(bool
, default:False
) –If true, whenever parameter difference is less then
ptol
, L-BFGS state will be reset. Defaults to None. -
gtol
(float | None
, default:1e-32
) –skips updating the history if if maximum absolute value of gradient difference is less than this value. Defaults to 1e-10.
-
ptol_restart
(bool
, default:False
) –If true, whenever gradient difference is less then
gtol
, L-BFGS state will be reset. Defaults to None. -
sy_tol
(float | None
, default:1e-32
) –history will not be updated whenever s⋅y is less than this value (negative s⋅y means negative curvature)
-
scale_first
(bool
, default:True
) –makes first step, when hessian approximation is not available, small to reduce number of line search iterations. Defaults to True.
-
update_freq
(int
, default:1
) –how often to update L-BFGS history. Larger values may be better for stochastic optimization. Defaults to 1.
-
damping
(Union
, default:None
) –damping to use, can be "powell" or "double". Defaults to None.
-
inner
(Chainable | None
, default:None
) –optional inner modules applied after updating L-BFGS history and before preconditioning. Defaults to None.
Examples:¶
L-BFGS with line search
L-BFGS with trust region
Source code in torchzero/modules/quasi_newton/lbfgs.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 |
|
LSR1 ¶
Bases: torchzero.core.transform.Transform
Limited-memory SR1 algorithm. A line search or trust region is recommended.
Parameters:
-
history_size
(int
, default:10
) –number of past parameter differences and gradient differences to store. Defaults to 10.
-
ptol
(float | None
, default:None
) –skips updating the history if maximum absolute value of parameter difference is less than this value. Defaults to None.
-
ptol_restart
(bool
, default:False
) –If true, whenever parameter difference is less then
ptol
, L-SR1 state will be reset. Defaults to None. -
gtol
(float | None
, default:None
) –skips updating the history if if maximum absolute value of gradient difference is less than this value. Defaults to None.
-
ptol_restart
(bool
, default:False
) –If true, whenever gradient difference is less then
gtol
, L-SR1 state will be reset. Defaults to None. -
scale_first
(bool
, default:False
) –makes first step, when hessian approximation is not available, small to reduce number of line search iterations. Defaults to False.
-
update_freq
(int
, default:1
) –how often to update L-SR1 history. Larger values may be better for stochastic optimization. Defaults to 1.
-
damping
(Union
, default:None
) –damping to use, can be "powell" or "double". Defaults to None.
-
compact
(bool
) –if True, uses a compact representation verstion of L-SR1. It is much faster computationally, but less stable.
-
inner
(Chainable | None
, default:None
) –optional inner modules applied after updating L-SR1 history and before preconditioning. Defaults to None.
Examples:¶
L-SR1 with line search
L-SR1 with trust region
Source code in torchzero/modules/quasi_newton/lsr1.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 |
|
McCormick ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
McCormicks's Quasi-Newton method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Pearson, J. D. (1969). Variable metric methods of minimisation. The Computer Journal, 12(2), 171–178. doi:10.1093/comjnl/12.2.171.
This is "Algorithm 2", attributed to McCormick in this paper. However for some reason this method is also called Pearson's 2nd method in other sources.
Source code in torchzero/modules/quasi_newton/quasi_newton.py
NewDQN ¶
Bases: torchzero.modules.quasi_newton.diagonal_quasi_newton.DNRTR
Diagonal quasi-newton method.
Reference
Nosrati, Mahsa, and Keyvan Amini. "A new diagonal quasi-Newton algorithm for unconstrained optimization problems." Applications of Mathematics 69.4 (2024): 501-512.
Source code in torchzero/modules/quasi_newton/diagonal_quasi_newton.py
NewSSM ¶
Bases: torchzero.modules.quasi_newton.quasi_newton.HessianUpdateStrategy
Self-scaling Quasi-Newton method.
Note
a line search such as tz.m.StrongWolfe()
is required.
Warning
this uses roughly O(N^2) memory.
Reference
Moghrabi, I. A., Hassan, B. A., & Askar, A. (2022). New self-scaling quasi-newton methods for unconstrained optimization. Int. J. Math. Comput. Sci., 17, 1061U.
Source code in torchzero/modules/quasi_newton/quasi_newton.py
PSB ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._HessianUpdateStrategyDefaults
Powell's Symmetric Broyden Quasi-Newton method.
Note
a line search or a trust region is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Spedicato, E., & Huang, Z. (1997). Numerical experience with newton-like methods for nonlinear algebraic systems. Computing, 58(1), 69–89. doi:10.1007/bf02684472
Source code in torchzero/modules/quasi_newton/quasi_newton.py
Pearson ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Pearson's Quasi-Newton method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Pearson, J. D. (1969). Variable metric methods of minimisation. The Computer Journal, 12(2), 171–178. doi:10.1093/comjnl/12.2.171.
Source code in torchzero/modules/quasi_newton/quasi_newton.py
ProjectedNewtonRaphson ¶
Bases: torchzero.modules.quasi_newton.quasi_newton.HessianUpdateStrategy
Projected Newton Raphson method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Pearson, J. D. (1969). Variable metric methods of minimisation. The Computer Journal, 12(2), 171–178. doi:10.1093/comjnl/12.2.171.
This one is Algorithm 7.
Source code in torchzero/modules/quasi_newton/quasi_newton.py
SR1 ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Symmetric Rank 1. This works best with a trust region:
Parameters:
-
init_scale
(float | Literal['auto']
, default:'auto'
) –initial hessian matrix is set to identity times this.
"auto" corresponds to a heuristic from [1] p.142-143.
Defaults to "auto".
-
tol
(float
, default:1e-32
) –tolerance for denominator in SR1 update rule as in [1] p.146. Defaults to 1e-32.
-
ptol
(float | None
, default:1e-32
) –skips update if maximum difference between current and previous gradients is less than this, to avoid instability. Defaults to 1e-32.
-
ptol_restart
(bool
, default:False
) –whether to reset the hessian approximation when ptol tolerance is not met. Defaults to False.
-
restart_interval
(int | None | Literal['auto']
, default:None
) –interval between resetting the hessian approximation.
"auto" corresponds to number of decision variables + 1.
None - no resets.
Defaults to None.
-
beta
(float | None
, default:None
) –momentum on H or B. Defaults to None.
-
update_freq
(int
, default:1
) –frequency of updating H or B. Defaults to 1.
-
scale_first
(bool
, default:False
) –whether to downscale first step before hessian approximation becomes available. Defaults to True.
-
scale_second
(bool
) –whether to downscale second step. Defaults to False.
-
concat_params
(bool
, default:True
) –If true, all parameters are treated as a single vector. If False, the update rule is applied to each parameter separately. Defaults to True.
-
inner
(Chainable | None
, default:None
) –preconditioning is applied to the output of this module. Defaults to None.
Examples:¶
SR1 with trust region
References:¶
[1]. Nocedal. Stephen J. Wright. Numerical Optimization
Source code in torchzero/modules/quasi_newton/quasi_newton.py
SSVM ¶
Bases: torchzero.modules.quasi_newton.quasi_newton.HessianUpdateStrategy
Self-scaling variable metric Quasi-Newton method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Oren, S. S., & Spedicato, E. (1976). Optimal conditioning of self-scaling variable Metric algorithms. Mathematical Programming, 10(1), 70–90. doi:10.1007/bf01580654
Source code in torchzero/modules/quasi_newton/quasi_newton.py
ShorR ¶
Bases: torchzero.modules.quasi_newton.quasi_newton.HessianUpdateStrategy
Shor’s r-algorithm.
Note
A line search such as tz.m.StrongWolfe(a_init="quadratic", fallback=True)
is required.
Similarly to conjugate gradient, ShorR doesn't have an automatic step size scaling,
so setting a_init
in the line search is recommended.
References
S HOR , N. Z. (1985) Minimization Methods for Non-differentiable Functions. New York: Springer.
Burke, James V., Adrian S. Lewis, and Michael L. Overton. "The Speed of Shor's R-algorithm." IMA Journal of numerical analysis 28.4 (2008): 711-720. - good overview.
Ansari, Zafar A. Limited Memory Space Dilation and Reduction Algorithms. Diss. Virginia Tech, 1998. - this is where a more efficient formula is described.
Source code in torchzero/modules/quasi_newton/quasi_newton.py
ThomasOptimalMethod ¶
Bases: torchzero.modules.quasi_newton.quasi_newton._InverseHessianUpdateStrategyDefaults
Thomas's "optimal" Quasi-Newton method.
Note
a line search is recommended.
Warning
this uses at least O(N^2) memory.
Reference
Thomas, Stephen Walter. Sequential estimation techniques for quasi-Newton algorithms. Cornell University, 1975.