Line search¶
This subpackage contains line search methods.
See also¶
- Step size - step size selection methods like Barzilai-Borwein and Polyak's step size.
- Trust region - trust region methods.
Classes:
-
AdaptiveBacktracking
–Adaptive backtracking line search. After each line search procedure, a new initial step size is set
-
AdaptiveTracking
–A line search that evaluates previous step size, if value increased, backtracks until the value stops decreasing,
-
Backtracking
–Backtracking line search.
-
LineSearchBase
–Base class for line searches.
-
ScipyMinimizeScalar
–Line search via :code:
scipy.optimize.minimize_scalar
which implements brent, golden search and bounded brent methods. -
StrongWolfe
–Interpolation line search satisfying Strong Wolfe condition.
AdaptiveBacktracking ¶
Bases: torchzero.modules.line_search.line_search.LineSearchBase
Adaptive backtracking line search. After each line search procedure, a new initial step size is set such that optimal step size in the procedure would be found on the second line search iteration.
Parameters:
-
init
(float
, default:1.0
) –initial step size. Defaults to 1.0.
-
beta
(float
, default:0.5
) –multiplies each consecutive step size by this value. Defaults to 0.5.
-
c
(float
, default:0.0001
) –sufficient decrease condition. Defaults to 1e-4.
-
condition
(Literal
, default:'armijo'
) –termination condition, only ones that do not use gradient at f(x+a*d) can be specified. - "armijo" - sufficient decrease condition. - "decrease" - any decrease in objective function value satisfies the condition.
"goldstein" can techincally be specified but it doesn't make sense because there is not zoom stage. Defaults to 'armijo'.
-
maxiter
(int
, default:20
) –maximum number of function evaluations per step. Defaults to 10.
-
target_iters
(int
, default:1
) –sets next step size such that this number of iterations are expected to be performed until optimal step size is found. Defaults to 1.
-
nplus
(float
, default:2.0
) –if initial step size is optimal, it is multiplied by this value. Defaults to 2.0.
-
scale_beta
(float
, default:0.0
) –momentum for initial step size, at 0 disables momentum. Defaults to 0.0.
Source code in torchzero/modules/line_search/backtracking.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
|
AdaptiveTracking ¶
Bases: torchzero.modules.line_search.line_search.LineSearchBase
A line search that evaluates previous step size, if value increased, backtracks until the value stops decreasing, otherwise forward-tracks until value stops decreasing.
Parameters:
-
init
(float
, default:1.0
) –initial step size. Defaults to 1.0.
-
nplus
(float
, default:2
) –multiplier to step size if initial step size is optimal. Defaults to 2.
-
nminus
(float
, default:0.5
) –multiplier to step size if initial step size is too big. Defaults to 0.5.
-
maxiter
(int
, default:10
) –maximum number of function evaluations per step. Defaults to 10.
-
adaptive
(bool
, default:True
) –when enabled, if line search failed, step size will continue decreasing on the next step. Otherwise it will restart the line search from
init
step size. Defaults to True.
Source code in torchzero/modules/line_search/adaptive.py
Backtracking ¶
Bases: torchzero.modules.line_search.line_search.LineSearchBase
Backtracking line search.
Parameters:
-
init
(float
, default:1.0
) –initial step size. Defaults to 1.0.
-
beta
(float
, default:0.5
) –multiplies each consecutive step size by this value. Defaults to 0.5.
-
c
(float
, default:0.0001
) –sufficient decrease condition. Defaults to 1e-4.
-
condition
(Literal
, default:'armijo'
) –termination condition, only ones that do not use gradient at f(x+a*d) can be specified. - "armijo" - sufficient decrease condition. - "decrease" - any decrease in objective function value satisfies the condition.
"goldstein" can techincally be specified but it doesn't make sense because there is not zoom stage. Defaults to 'armijo'.
-
maxiter
(int
, default:10
) –maximum number of function evaluations per step. Defaults to 10.
-
adaptive
(bool
, default:True
) –when enabled, if line search failed, step size will continue decreasing on the next step. Otherwise it will restart the line search from
init
step size. Defaults to True.
Examples: Gradient descent with backtracking line search:
L-BFGS with backtracking line search:
Source code in torchzero/modules/line_search/backtracking.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
LineSearchBase ¶
Bases: torchzero.core.module.Module
, abc.ABC
Base class for line searches.
This is an abstract class, to use it, subclass it and override search
.
Parameters:
-
defaults
(dict[str, Any] | None
) –dictionary with defaults.
-
maxiter
(int | None
, default:None
) –if this is specified, the search method will terminate upon evaluating the objective this many times, and step size with the lowest loss value will be used. This is useful when passing
make_objective
to an external library which doesn't have a maxiter option. Defaults to None.
Other useful methods
evaluate_f
- returns loss with a given scalar step sizeevaluate_f_d
- returns loss and directional derivative with a given scalar step sizemake_objective
- creates a function that accepts a scalar step size and returns loss. This can be passed to a scalar solver, such as scipy.optimize.minimize_scalar.make_objective_with_derivative
- creates a function that accepts a scalar step size and returns a tuple with loss and directional derivative. This can be passed to a scalar solver.
Examples:
Basic line search¶
This evaluates all step sizes in a range by using the :code:self.evaluate_step_size
method.
class GridLineSearch(LineSearch):
def __init__(self, start, end, num):
defaults = dict(start=start,end=end,num=num)
super().__init__(defaults)
@torch.no_grad
def search(self, update, var):
start = self.defaults["start"]
end = self.defaults["end"]
num = self.defaults["num"]
lowest_loss = float("inf")
best_step_size = best_step_size
for step_size in torch.linspace(start,end,num):
loss = self.evaluate_step_size(step_size.item(), var=var, backward=False)
if loss < lowest_loss:
lowest_loss = loss
best_step_size = step_size
return best_step_size
Using external solver via self.make_objective¶
Here we let :code:scipy.optimize.minimize_scalar
solver find the best step size via :code:self.make_objective
class ScipyMinimizeScalar(LineSearch):
def __init__(self, method: str | None = None):
defaults = dict(method=method)
super().__init__(defaults)
@torch.no_grad
def search(self, update, var):
objective = self.make_objective(var=var)
method = self.defaults["method"]
res = self.scopt.minimize_scalar(objective, method=method)
return res.x
Methods:
-
evaluate_f
–evaluate function value at alpha
step_size
. -
evaluate_f_d
–evaluate function value and directional derivative in the direction of the update at step size
step_size
. -
evaluate_f_d_g
–evaluate function value, directional derivative, and gradient list at step size
step_size
. -
search
–Finds the step size to use
Source code in torchzero/modules/line_search/line_search.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
|
evaluate_f ¶
evaluate_f(step_size: float, var: Var, backward: bool = False)
evaluate function value at alpha step_size
.
Source code in torchzero/modules/line_search/line_search.py
evaluate_f_d ¶
evaluate_f_d(step_size: float, var: Var)
evaluate function value and directional derivative in the direction of the update at step size step_size
.
Source code in torchzero/modules/line_search/line_search.py
evaluate_f_d_g ¶
evaluate_f_d_g(step_size: float, var: Var)
evaluate function value, directional derivative, and gradient list at step size step_size
.
Source code in torchzero/modules/line_search/line_search.py
ScipyMinimizeScalar ¶
Bases: torchzero.modules.line_search.line_search.LineSearchBase
Line search via :code:scipy.optimize.minimize_scalar
which implements brent, golden search and bounded brent methods.
Parameters:
-
method
(str | None
, default:None
) –"brent", "golden" or "bounded". Defaults to None.
-
maxiter
(int | None
, default:None
) –maximum number of function evaluations the line search is allowed to perform. Defaults to None.
-
bracket
(Sequence | None
, default:None
) –Either a triple (xa, xb, xc) satisfying xa < xb < xc and func(xb) < func(xa) and func(xb) < func(xc), or a pair (xa, xb) to be used as initial points for a downhill bracket search. Defaults to None.
-
bounds
(Sequence | None
, default:None
) –For method ‘bounded’, bounds is mandatory and must have two finite items corresponding to the optimization bounds. Defaults to None.
-
tol
(float | None
, default:None
) –Tolerance for termination. Defaults to None.
-
options
(dict | None
, default:None
) –A dictionary of solver options. Defaults to None.
For more details on methods and arguments refer to https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize_scalar.html
Source code in torchzero/modules/line_search/scipy.py
StrongWolfe ¶
Bases: torchzero.modules.line_search.line_search.LineSearchBase
Interpolation line search satisfying Strong Wolfe condition.
Parameters:
-
c1
(float
, default:0.0001
) –sufficient descent condition. Defaults to 1e-4.
-
c2
(float
, default:0.9
) –strong curvature condition. For CG set to 0.1. Defaults to 0.9.
-
a_init
(str
, default:'fixed'
) –strategy for initializing the initial step size guess. - "fixed" - uses a fixed value specified in
init_value
argument. - "first-order" - assumes first-order change in the function at iterate will be the same as that obtained at the previous step. - "quadratic" - interpolates quadratic to f(x_{-1}) and f_x. - "quadratic-clip" - same as quad, but uses min(1, 1.01*alpha) as described in Numerical Optimization. - "previous" - uses final step size found on previous iteration.For 2nd order methods it is usually best to leave at "fixed". For methods that do not produce well scaled search directions, e.g. conjugate gradient, "first-order" or "quadratic-clip" are recommended. Defaults to 'init'.
-
a_max
(float
, default:1000000000000.0
) –upper bound for the proposed step sizes. Defaults to 1e12.
-
init_value
(float
, default:1
) –initial step size. Used when
a_init
="fixed", and with other strategies as fallback value. Defaults to 1. -
maxiter
(int
, default:25
) –maximum number of line search iterations. Defaults to 25.
-
maxzoom
(int
, default:10
) –maximum number of zoom iterations. Defaults to 10.
-
maxeval
(int | None
, default:None
) –maximum number of function evaluations. Defaults to None.
-
tol_change
(float
, default:1e-09
) –tolerance, terminates on small brackets. Defaults to 1e-9.
-
interpolation
(str
, default:'cubic'
) –What type of interpolation to use. - "bisection" - uses the middle point. This is robust, especially if the objective function is non-smooth, however it may need more function evaluations. - "quadratic" - minimizes a quadratic model, generally outperformed by "cubic". - "cubic" - minimizes a cubic model - this is the most widely used interpolation strategy. - "polynomial" - fits a a polynomial to all points obtained during line search. - "polynomial2" - alternative polynomial fit, where if a point is outside of bounds, a lower degree polynomial is tried. This may have faster convergence than "cubic" and "polynomial".
Defaults to 'cubic'.
-
adaptive
(bool
, default:True
) –if True, the initial step size will be halved when line search failed to find a good direction. When a good direction is found, initial step size is reset to the original value. Defaults to True.
-
fallback
(bool
, default:False
) –if True, when no point satisfied strong wolfe criteria, returns a point with value lower than initial value that doesn't satisfy the criteria. Defaults to False.
-
plus_minus
(bool
, default:False
) –if True, enables the plus-minus variant, where if curvature is negative, line search is performed in the opposite direction. Defaults to False.
Examples:¶
Conjugate gradient method with strong wolfe line search. Nocedal, Wright recommend setting c2 to 0.1 for CG. Since CG doesn't produce well scaled directions, initial alpha can be determined from function values by a_init="first-order"
.
opt = tz.Modular(
model.parameters(),
tz.m.PolakRibiere(),
tz.m.StrongWolfe(c2=0.1, a_init="first-order")
)
LBFGS strong wolfe line search:
Source code in torchzero/modules/line_search/strong_wolfe.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 |
|