Smoothing¶
This subpackage contains smoothing-based optimization modules, currently laplacian and gaussian smoothing.
Classes:
-
GradientSampling
–Samples and aggregates gradients and values at perturbed points.
-
LaplacianSmoothing
–Applies laplacian smoothing via a fast Fourier transform solver which can improve generalization.
GradientSampling ¶
Bases: torchzero.core.reformulation.Reformulation
Samples and aggregates gradients and values at perturbed points.
This module can be used for gaussian homotopy and gradient sampling methods.
Parameters:
-
modules
(Chainable | None
, default:None
) –modules that will be optimizing the modified objective. if None, returns gradient of the modified objective as the update. Defaults to None.
-
sigma
(float
, default:1.0
) –initial magnitude of the perturbations. Defaults to 1.
-
n
(int
, default:100
) –number of perturbations per step. Defaults to 100.
-
aggregate
(str
, default:'mean'
) –how to aggregate values and gradients - "mean" - uses mean of the gradients, as in gaussian homotopy. - "max" - uses element-wise maximum of the gradients. - "min" - uses element-wise minimum of the gradients. - "min-norm" - picks gradient with the lowest norm.
Defaults to 'mean'.
-
distribution
(Literal
, default:'gaussian'
) –distribution for random perturbations. Defaults to 'gaussian'.
-
include_x0
(bool
, default:True
) –whether to include gradient at un-perturbed point. Defaults to True.
-
fixed
(bool
, default:True
) –if True, perturbations do not get replaced by new random perturbations until termination criteria is satisfied. Defaults to True.
-
pre_generate
(bool
, default:True
) –if True, perturbations are pre-generated before each step. This requires more memory to store all of them, but ensures they do not change when closure is evaluated multiple times. Defaults to True.
-
termination
(TerminationCriteriaBase | Sequence[TerminationCriteriaBase] | None
, default:None
) –a termination criteria module, sigma will be multiplied by
decay
when termination criteria is satisfied, and new perturbations will be generated iffixed
. Defaults to None. -
decay
(float
, default:0.6666666666666666
) –sigma multiplier on termination criteria. Defaults to 2/3.
-
reset_on_termination
(bool
, default:True
) –whether to reset states of all other modules on termination. Defaults to True.
-
sigma_strategy
(str | None
, default:None
) –strategy for adapting sigma. If condition is satisfied, sigma is multiplied by
sigma_nplus
, otherwise it is multiplied bysigma_nminus
. - "grad-norm" - at leastsigma_target
gradients should have lower norm than at un-perturbed point. - "value" - at leastsigma_target
values (losses) should be lower than at un-perturbed point. - None - doesn't use adaptive sigma.This introduces a side-effect to the closure, so it should be left at None of you use trust region or line search to optimize the modified objective. Defaults to None.
-
sigma_target
(int
, default:0.2
) –number of elements to satisfy the condition in
sigma_strategy
. Defaults to 1. -
sigma_nplus
(float
, default:1.3333333333333333
) –sigma multiplier when
sigma_strategy
condition is satisfied. Defaults to 4/3. -
sigma_nminus
(float
, default:0.6666666666666666
) –sigma multiplier when
sigma_strategy
condition is not satisfied. Defaults to 2/3. -
seed
(int | None
, default:None
) –seed. Defaults to None.
Source code in torchzero/modules/smoothing/sampling.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
|
LaplacianSmoothing ¶
Bases: torchzero.core.transform.Transform
Applies laplacian smoothing via a fast Fourier transform solver which can improve generalization.
Parameters:
-
sigma
(float
, default:1
) –controls the amount of smoothing. Defaults to 1.
-
layerwise
(bool
, default:True
) –If True, applies smoothing to each parameter's gradient separately, Otherwise applies it to all gradients, concatenated into a single vector. Defaults to True.
-
min_numel
(int
, default:4
) –minimum number of elements in a parameter to apply laplacian smoothing to. Only has effect if
layerwise
is True. Defaults to 4. -
target
(str
, default:'update'
) –what to set on var.
Examples:
Laplacian Smoothing Gradient Descent optimizer as in the paper
.. code-block:: python
opt = tz.Modular(
model.parameters(),
tz.m.LaplacianSmoothing(),
tz.m.LR(1e-2),
)
Reference
Osher, S., Wang, B., Yin, P., Luo, X., Barekat, F., Pham, M., & Lin, A. (2022). Laplacian smoothing gradient descent. Research in the Mathematical Sciences, 9(3), 55.