Optimizers

Bop

QuantizedNetworks.BopType
Bop{T}

Bop is a custom binary optimizer type that implements a variant of the stochastic gradient descent SGD optimizer with a binary threshold for momentum updates, to decide the direction of the updates. If the momentum exceeds the threshold and has the same sign as the gradient, the update is set to a positive constant (one(x)); otherwise, it is set to a negative constant (-one(x)). Allows you to control the direction of parameter updates based on the momentum history. It is compatible with the Flux.jl machine learning library.

Bop(ρ, τ, momentum)

Fields

  • ρ (rho): learning rate hyperparameter (default is 1e-4).
  • τ (tau): binary threshold hyperparameter for momentum updates (default is 1e-8).
  • momentum: dictionary that stores the momentum for each parameter (default is an empty dictionary).
source
Flux.Optimise.apply!Method
Flux.Optimise.apply!(b::Bop, x, Δ)

A custom apply! function, which is required for optimizers in Flux.jl.

  • x: The parameters (model weights, bias) to be updated.
  • Δ: The gradients or updates for the parameters.
source

Case optimizer

QuantizedNetworks.CaseOptimizerType
CaseOptimizer

A custom optimizer that works with different optimization strategies based on conditions. It selects the appropriate optimizer from a collection of conditions and associated optimizer objects. If none of the conditions match, it uses a default optimizer. This flexibility enables you to adapt the optimization strategy during training based on specific conditions or requirements.

Fields

  • optimizers: A collection of condition-to-optimizer mappings. This field stores pairs of conditions and corresponding optimizer objects.
  • default: The default optimizer, an instance of AdaBelief, to use when no conditions match any of the conditions defined in the optimizers field.
source
Flux.Optimise.apply!Method
Flux.Optimise.apply!(o::CaseOptimizer, x, Δ)

A custom apply! function for the CaseOptimizer, which is required for optimizers in Flux.jl. Arguments:

  • x: The parameters (model weights, bias) to be updated.
  • Δ: The gradients or updates for the parameters.
source