Quantizers

QuantizedNetworks.AbstractQuantizer — Type

Quantizers are used to limit the range of possible numerical values. Useful for quantizing neural networks, to work on hardware with limited computational resources.

Quantizer type objects are also functors, i.e. can be called as a function directly supplying the input data and it is the equivalent of calling forward_pass.

source

QuantizedNetworks.forward_pass — Method

forward_pass(q::AbstractQuantizer, x)

Applies quantizer to Array type x, so each value of x will be quantized.

source

QuantizedNetworks.pullback — Method

pullback(q::AbstractQuantizer, x)

Returns gradient of the selected quantizer, with respect to x, by estimating the quantizing function and using the derivative of that estimation to calculate the gradient.

source

Binary

QuantizedNetworks.Sign — Type

Sign(estimator::AbstractEstimator = STE())

Deterministic binary quantizer that returns -1 when the given input is less than zero or Missing and 1 otherwise

\[sign(x) = \begin{cases} -1 & x < 0 \\ 1 & x \geq 0 \end{cases}\]

The type of the inputs is preserved with exception of Missing input, when it will be quantized into -1.

Quantizers require an estimator to be specified, if none is supplied it will default to Straight Through Estimator STE, with default threshold 2.

Estimators

Estimators are used to estimate the non-existing gradient of the Sign function. They are used only on backward pass.

STE(threshold::Real = 2): Straight-Through Estimator approximates the sign function using the cliped version of the identity function

\[clip(x) = \begin{cases} -1 & x < \text{threshold} \\ 1 & x > \text{threshold} \\ x & \text{otherwise} \end{cases}\]

with the gradient defined as following

\[\frac{\partial clip}{\partial x} = \begin{cases} 1 & \left|x\right| \leq \text{threshold} \\ 0 & \left|x\right| > \text{threshold} \end{cases}\]

The following code plots the quantizer function and the first derivative of its linear estimation. The threshold represents the range of input values for quantization.

using Plots, QuantizedNetworks: forward_pass, pullback
q = Sign(STE(1.5))
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)

plot(x,y, label = "quantizer", title = "Sign quantizer - STE (threshold = 1.5)")
plot!(x,dy, label="gradient", line = (:path, 2))

PolynomialSTE(): Polynomial estimater approximates the sign function using the piecewise polynomial function

\[poly(x) = \begin{cases} -1 & x < -1 \\ 2x + x^2 & -1 \leq x < 0 \\ 2x - x^2 & 0 \leq x < 1 \\ 1 & \text{otherwise} \end{cases}\]

with the gradient is defined as

\[\frac{\partial poly}{\partial x} = \begin{cases} 2 + 2x & -1 \leq x < 0 \\ 2 - 2x & 0 \leq x < 1 \\ 0 & \text{otherwise} \end{cases}\]

The following code plots the quantizer function and the first derivative of its polynomial estimation.

using QuantizedNetworks: forward_pass, pullback
q = Sign(PolynomialSTE())
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)

plot(x,y, label = "quantizer", title = "Sign quantizer - PolynomialSTE")
plot!(x,dy, label="gradient", line = (:path, 2))

SwishSTE(β=5): SignSwish estimator approximates the sign function using the boundles swish function

\[sswish_{\beta}(x) = 2\sigma(\beta x) \left(1 + \beta x (1 - \sigma(\beta x))\right)\]

where $\sigma(x)$ is the sigmoid function and $\beta > 0$ controls how fast the function asymptotes to −1 and +1. The gradient is defined as

\[\frac{\partial sswish_{\beta}}{\partial x} = \frac{\beta\left( 2-\beta x \tanh \left(\frac{\beta x}{2}\right) \right)}{1+\cosh (\beta x)}\]

The following code plots the quantizer function and the first derivative of its swish estimation.

using QuantizedNetworks: forward_pass, pullback
q = Sign(SwishSTE(2))
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)

plot(x,y, label = "quantizer", title = "Sign quantizer - SwishSTE (β = 2)")
plot!(x,dy, label="gradient", line = (:path, 2))

Examples

julia> using QuantizedNetworks: pullback

julia> x = [-2.0, -0.5, 0.0, 0.5, 1.0, missing];

julia> q = Sign()
Sign(STE(2))

julia> q(x)
6-element Vector{Float64}:
 -1.0
 -1.0
  1.0
  1.0
  1.0
 -1.0

julia> pullback(q, x)
6-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0
 1.0
 0.0

julia> pullback(Sign(PolynomialSTE()), x)
6-element Vector{Float64}:
 0.0
 1.0
 2.0
 1.0
 0.0
 0.0

source

QuantizedNetworks.Heaviside — Type

Heaviside(estimator::AbstractEstimator = STE())

Deterministic binary quantizer that return 0 when the given input is less than zero or Missing and 1 otherwise

\[heaviside(x) = \begin{cases} 0 & x \leq 0 \\ 1 & x > 0 \end{cases}\]

The type of the inputs is preserved with exception of Missing input.

Estimators

Estimators are used to estimate non-existing gradient of the heaviside function. They are used only on backward pass.

STE(threshold::Real = 2): Straight-Through Estimator approximates the heaviside function using the clip function

\[clip(x) = \begin{cases} 0 & x < \text{threshold} \\ 1 & x > \text{threshold} \\ x & \text{otherwise} \end{cases}\]

with the gradient is defined as a clipped identity

\[\frac{\partial clip}{\partial x} = \begin{cases} 1 & \left|x\right| \leq \text{threshold} \\\ 0 & \left|x\right| > \text{threshold} \end{cases}\]

The following code plots the heaviside quantizer function and the first derivative of its linear estimation.

using QuantizedNetworks: forward_pass, pullback
q = Heaviside(STE(3))
x = -5:1/100:5
y = forward_pass(q, x)
dy = pullback(q, x)

plot(x,y, label = "quantizer", title = "Heaviside quantizer - STE (threshold = 3)")
plot!(x,dy, label="gradient", line = (:path, 2))

Examples

julia> using QuantizedNetworks: pullback

julia> x = [-2.0, -0.5, 0.0, 0.5, 1.0, missing];

julia> q = Heaviside()
Heaviside(STE(2))

julia> q(x)
6-element Vector{Float64}:
 0.0
 0.0
 0.0
 1.0
 1.0
 0.0

julia> pullback(q, x)
6-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0
 1.0
 0.0

source

Ternary

QuantizedNetworks.Ternary — Type

Ternary(Δ::T=0.05, estimator::AbstractEstimator = STE())

Deterministic ternary quantizer that return -1 when the given input is less than -Δ, 1 whe the input in greater than Δ, and 0 otherwise. For Missing input, the output is 0.

\[ternary(x) = \begin{cases} -1 & x < -\Delta \\ 1 & x > \Delta \\ 0 & \text{otherwise} \end{cases}\]

The type of the inputs is preserved with exception of Missing input.

Estimators

Estimators are used to estimate non-existing gradient of the ternary function. They are used only on backward pass.

STE(threshold::Real = 2): Straight-Through Estimator approximates the ternary function using the clip function

\[clip(x) = \begin{cases} -1 & x < \text{threshold} \\ 1 & x > \text{threshold} \\ x & \text{otherwise} \end{cases}\]

with the gradient is defined as a clipped identity

\[\frac{\partial clip}{\partial x} = \begin{cases} 1 & \left|x\right| \leq \text{threshold} \\\ -1 & \left|x\right| > \text{threshold} \end{cases}\]

using QuantizedNetworks: forward_pass, pullback
q = Ternary(1.5, STE(3))
x = -5:1/100:5
y = forward_pass.(q, x)
dy = pullback.(q, x)

plot(x,y, label = "quantizer", title = "Ternary quantizer - (Δ=1.5, STE threshold=3)")
plot!(x,dy, label="gradient", line = (:path, 2))

Examples

julia> using QuantizedNetworks: pullback

julia> x = [-2.0, -0.5, 0.0, 0.5, 1.0, missing];

julia> q = Ternary()
Ternary(0.05, STE(2))

julia> q(x)
6-element Vector{Float64}:
 -1.0
 -1.0
  0.0
  1.0
  1.0
  0.0

julia> pullback(q, x)
6-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0
 1.0
 0.0

source

Quantizers

Binary

Ternary

References