Optimizers
Bop
QuantizedNetworks.Bop
— TypeBop{T}
Bop
is a custom binary optimizer type that implements a variant of the stochastic gradient descent SGD
optimizer with a binary threshold for momentum updates, to decide the direction of the updates. If the momentum exceeds the threshold and has the same sign as the gradient, the update is set to a positive constant (one(x)); otherwise, it is set to a negative constant (-one(x)). Allows you to control the direction of parameter updates based on the momentum history. It is compatible with the Flux.jl
machine learning library.
Bop(ρ, τ, momentum)
Fields
- ρ (rho): learning rate hyperparameter (default is 1e-4).
- τ (tau): binary threshold hyperparameter for momentum updates (default is 1e-8).
- momentum: dictionary that stores the momentum for each parameter (default is an empty dictionary).
Flux.Optimise.apply!
— MethodFlux.Optimise.apply!(b::Bop, x, Δ)
A custom apply! function, which is required for optimizers in Flux.jl.
x
: The parameters (model weights, bias) to be updated.Δ
: The gradients or updates for the parameters.
Case optimizer
QuantizedNetworks.CaseOptimizer
— TypeCaseOptimizer
A custom optimizer that works with different optimization strategies based on conditions. It selects the appropriate optimizer from a collection of conditions and associated optimizer objects. If none of the conditions match, it uses a default optimizer. This flexibility enables you to adapt the optimization strategy during training based on specific conditions or requirements.
Fields
optimizers
: A collection of condition-to-optimizer mappings. This field stores pairs of conditions and corresponding optimizer objects.default
: The default optimizer, an instance ofAdaBelief
, to use when no conditions match any of the conditions defined in the optimizers field.
Flux.Optimise.apply!
— MethodFlux.Optimise.apply!(o::CaseOptimizer, x, Δ)
A custom apply! function for the CaseOptimizer, which is required for optimizers in Flux.jl. Arguments:
x
: The parameters (model weights, bias) to be updated.Δ
: The gradients or updates for the parameters.