Quantizers
QuantizedNetworks.AbstractQuantizer
— TypeQuantizers are used to limit the range of possible numerical values. Useful for quantizing neural networks, to work on hardware with limited computational resources.
Quantizer type objects are also functors, i.e. can be called as a function directly supplying the input data and it is the equivalent of calling forward_pass
.
QuantizedNetworks.forward_pass
— Methodforward_pass(q::AbstractQuantizer, x)
Applies quantizer to Array type x
, so each value of x
will be quantized.
QuantizedNetworks.pullback
— Methodpullback(q::AbstractQuantizer, x)
Returns gradient of the selected quantizer, with respect to x
, by estimating the quantizing function and using the derivative of that estimation to calculate the gradient.
Binary
QuantizedNetworks.Sign
— TypeSign(estimator::AbstractEstimator = STE())
Deterministic binary quantizer that returns -1
when the given input is less than zero or Missing
and 1
otherwise
\[sign(x) = \begin{cases} -1 & x < 0 \\ 1 & x \geq 0 \end{cases}\]
The type of the inputs is preserved with exception of Missing
input, when it will be quantized into -1.
Quantizers require an estimator to be specified, if none is supplied it will default to Straight Through Estimator STE
, with default threshold 2
.
Estimators
Estimators are used to estimate the non-existing gradient of the Sign function. They are used only on backward pass.
STE(threshold::Real = 2)
: Straight-Through Estimator approximates the sign function using the cliped version of the identity function
\[clip(x) = \begin{cases} -1 & x < \text{threshold} \\ 1 & x > \text{threshold} \\ x & \text{otherwise} \end{cases}\]
with the gradient defined as following
\[\frac{\partial clip}{\partial x} = \begin{cases} 1 & \left|x\right| \leq \text{threshold} \\ 0 & \left|x\right| > \text{threshold} \end{cases}\]
The following code plots the quantizer function and the first derivative of its linear estimation. The threshold represents the range of input values for quantization.
using Plots, QuantizedNetworks: forward_pass, pullback
q = Sign(STE(1.5))
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)
plot(x,y, label = "quantizer", title = "Sign quantizer - STE (threshold = 1.5)")
plot!(x,dy, label="gradient", line = (:path, 2))
PolynomialSTE()
: Polynomial estimater approximates the sign function using the piecewise polynomial function
\[poly(x) = \begin{cases} -1 & x < -1 \\ 2x + x^2 & -1 \leq x < 0 \\ 2x - x^2 & 0 \leq x < 1 \\ 1 & \text{otherwise} \end{cases}\]
with the gradient is defined as
\[\frac{\partial poly}{\partial x} = \begin{cases} 2 + 2x & -1 \leq x < 0 \\ 2 - 2x & 0 \leq x < 1 \\ 0 & \text{otherwise} \end{cases}\]
The following code plots the quantizer function and the first derivative of its polynomial estimation.
using QuantizedNetworks: forward_pass, pullback
q = Sign(PolynomialSTE())
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)
plot(x,y, label = "quantizer", title = "Sign quantizer - PolynomialSTE")
plot!(x,dy, label="gradient", line = (:path, 2))
SwishSTE(β=5)
: SignSwish estimator approximates the sign function using the boundles swish function
\[sswish_{\beta}(x) = 2\sigma(\beta x) \left(1 + \beta x (1 - \sigma(\beta x))\right)\]
where $\sigma(x)$ is the sigmoid function and $\beta > 0$ controls how fast the function asymptotes to −1
and +1
. The gradient is defined as
\[\frac{\partial sswish_{\beta}}{\partial x} = \frac{\beta\left( 2-\beta x \tanh \left(\frac{\beta x}{2}\right) \right)}{1+\cosh (\beta x)}\]
The following code plots the quantizer function and the first derivative of its swish estimation.
using QuantizedNetworks: forward_pass, pullback
q = Sign(SwishSTE(2))
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)
plot(x,y, label = "quantizer", title = "Sign quantizer - SwishSTE (β = 2)")
plot!(x,dy, label="gradient", line = (:path, 2))
Examples
julia> using QuantizedNetworks: pullback
julia> x = [-2.0, -0.5, 0.0, 0.5, 1.0, missing];
julia> q = Sign()
Sign(STE(2))
julia> q(x)
6-element Vector{Float64}:
-1.0
-1.0
1.0
1.0
1.0
-1.0
julia> pullback(q, x)
6-element Vector{Float64}:
1.0
1.0
1.0
1.0
1.0
0.0
julia> pullback(Sign(PolynomialSTE()), x)
6-element Vector{Float64}:
0.0
1.0
2.0
1.0
0.0
0.0
QuantizedNetworks.Heaviside
— TypeHeaviside(estimator::AbstractEstimator = STE())
Deterministic binary quantizer that return 0
when the given input is less than zero or Missing
and 1
otherwise
\[heaviside(x) = \begin{cases} 0 & x \leq 0 \\ 1 & x > 0 \end{cases}\]
The type of the inputs is preserved with exception of Missing
input.
Estimators
Estimators are used to estimate non-existing gradient of the heaviside function. They are used only on backward pass.
STE(threshold::Real = 2)
: Straight-Through Estimator approximates the heaviside function using the clip function
\[clip(x) = \begin{cases} 0 & x < \text{threshold} \\ 1 & x > \text{threshold} \\ x & \text{otherwise} \end{cases}\]
with the gradient is defined as a clipped identity
\[\frac{\partial clip}{\partial x} = \begin{cases} 1 & \left|x\right| \leq \text{threshold} \\\ 0 & \left|x\right| > \text{threshold} \end{cases}\]
The following code plots the heaviside quantizer function and the first derivative of its linear estimation.
using QuantizedNetworks: forward_pass, pullback
q = Heaviside(STE(3))
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)
plot(x,y, label = "quantizer", title = "Heaviside quantizer - STE (threshold = 3)")
plot!(x,dy, label="gradient", line = (:path, 2))
Examples
julia> using QuantizedNetworks: pullback
julia> x = [-2.0, -0.5, 0.0, 0.5, 1.0, missing];
julia> q = Heaviside()
Heaviside(STE(2))
julia> q(x)
6-element Vector{Float64}:
0.0
0.0
0.0
1.0
1.0
0.0
julia> pullback(q, x)
6-element Vector{Float64}:
1.0
1.0
1.0
1.0
1.0
0.0
Ternary
QuantizedNetworks.Ternary
— TypeTernary(Δ::T=0.05, estimator::AbstractEstimator = STE())
Deterministic ternary quantizer that return -1
when the given input is less than -Δ
, 1
whe the input in greater than Δ, and 0
otherwise. For Missing
input, the output is 0
.
\[ternary(x) = \begin{cases} -1 & x < -\Delta \\ 1 & x > \Delta \\ 0 & \text{otherwise} \end{cases}\]
The type of the inputs is preserved with exception of Missing
input.
Estimators
Estimators are used to estimate non-existing gradient of the ternary function. They are used only on backward pass.
STE(threshold::Real = 2)
: Straight-Through Estimator approximates the ternary function using the clip function
\[clip(x) = \begin{cases} -1 & x < \text{threshold} \\ 1 & x > \text{threshold} \\ x & \text{otherwise} \end{cases}\]
with the gradient is defined as a clipped identity
\[\frac{\partial clip}{\partial x} = \begin{cases} 1 & \left|x\right| \leq \text{threshold} \\\ -1 & \left|x\right| > \text{threshold} \end{cases}\]
using QuantizedNetworks: forward_pass, pullback
q = Ternary(1.5, STE(3))
x = -5:1/100:5
y = forward_pass.(q, x)
dy = pullback.(q, x)
plot(x,y, label = "quantizer", title = "Ternary quantizer - (Δ=1.5, STE threshold=3)")
plot!(x,dy, label="gradient", line = (:path, 2))
Examples
julia> using QuantizedNetworks: pullback
julia> x = [-2.0, -0.5, 0.0, 0.5, 1.0, missing];
julia> q = Ternary()
Ternary(0.05, STE(2))
julia> q(x)
6-element Vector{Float64}:
-1.0
-1.0
0.0
1.0
1.0
0.0
julia> pullback(q, x)
6-element Vector{Float64}:
1.0
1.0
1.0
1.0
1.0
0.0