Special arrays
Index
Mill.MaybeHotMatrix
Mill.MaybeHotVector
Mill.NGramIterator
Mill.NGramIterator
Mill.NGramMatrix
Mill.NGramMatrix
Mill.PostImputingMatrix
Mill.PostImputingMatrix
Mill.PreImputingMatrix
Mill.PreImputingMatrix
Mill.countngrams
Mill.countngrams!
Mill.maybecold
Mill.maybehot
Mill.maybehotbatch
Mill.ngrams
Mill.ngrams!
Mill.postimputing_dense
Mill.preimputing_dense
API
Mill.MaybeHotVector
— TypeMaybeHotVector{T, U} <: AbstractVector{U}
A vector-like structure for representing one-hot encoded variables. Like Flux.OneHotVector
but supports missing
values.
Construct with the maybehot
function.
See also: MaybeHotMatrix
, maybehotbatch
.
Mill.maybehot
— Functionmaybehot(l, labels)
Return a MaybeHotVector
where the first occurence of l
in labels
is set to 1
and all other elements are set to 0
.
Examples
julia> maybehot(:b, [:a, :b, :c])
3-element MaybeHotVector with eltype Bool:
⋅
1
⋅
julia> maybehot(missing, 1:3)
3-element MaybeHotVector with eltype Missing:
missing
missing
missing
See also: maybehotbatch
, MaybeHotVector
, MaybeHotMatrix
.
Mill.MaybeHotMatrix
— TypeMaybeHotMatrix{T, V} <: AbstractMatrix{U}
A matrix-like structure for representing one-hot encoded variables. Like Flux.OneHotMatrix
but supports missing
values.
Construct with the maybehotbatch
function.
See also: MaybeHotVector
, maybehot
.
Mill.maybehotbatch
— Functionmaybehotbatch(ls, labels)
Return a MaybeHotMatrix
in which each column corresponds to one element of ls
containing 1
at its first occurence in labels
with all other elements set to 0
.
Examples
julia> maybehotbatch([:c, :a], [:a, :b, :c])
3×2 MaybeHotMatrix with eltype Bool:
⋅ 1
⋅ ⋅
1 ⋅
julia> maybehotbatch([missing, 2], 1:3)
3×2 MaybeHotMatrix with eltype Union{Missing, Bool}:
missing ⋅
missing 1
missing ⋅
See also: maybehot
, MaybeHotMatrix
, MaybeHotVector
.
Mill.maybecold
— Functionmaybecold(y, labels=1:size(y,1))
Similar to Flux.onecold
but when y
contains missing
values, missing
is in the result as well.
Therefore, it is roughly the inverse operation of maybehot
or maybehotbatch
.
Examples
julia> maybehot(:b, [:a, :b, :c])
3-element MaybeHotVector with eltype Bool:
⋅
1
⋅
julia> maybecold(ans, [:a, :b, :c])
:b
julia> maybehot(missing, 1:3)
3-element MaybeHotVector with eltype Missing:
missing
missing
missing
julia> maybecold(ans)
missing
julia> maybecold(maybehotbatch([missing, 2], 1:3))
2-element Vector{Union{Missing, Int64}}:
missing
2
See also: Flux.onecold
, maybehot
, maybehotbatch
.
Mill.NGramIterator
— TypeNGramIterator{T}
Iterates over ngram codes of collection of integers s
using Mill.string_start_code()
and Mill.string_end_code()
for padding. NGram codes are computed as in positional number systems, where items of s
are digits, b
is the base, and m
is modulo.
In order to reduce collisions when mixing ngrams of different order one should avoid zeros and negative integers in s
and should set base b
to the expected number of unique tokens in s
.
See also: NGramMatrix
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.NGramIterator
— MethodNGramIterator(s, n=3, b=256, m=typemax(Int))
Construct an NGramIterator
. If s
is an AbstractString
it is first converted to integers with Base.codeunits
.
Examples
julia> NGramIterator("deadbeef", 3, 256, 17) |> collect
10-element Vector{Int64}:
2
16
9
9
6
10
11
15
2
6
julia> NGramIterator(collect(1:9), 3, 10, 1009) |> collect
11-element Vector{Int64}:
221
212
123
234
345
456
567
678
789
893
933
julia> Mill.string_start_code()
0x02
julia> Mill.string_end_code()
0x03
See also: NGramMatrix
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.ngrams
— Functionngrams(o, x, n=3, b=256)
Return codes of n
grams of x
using base b
.
Examples
julia> ngrams("foo", 3, 256)
5-element Vector{Int64}:
131686
157295
6713199
7302915
7275267
See also: ngrams!
, countngrams
, countngrams!
, NGramMatrix
, NGramIterator
.
Mill.ngrams!
— Functionngrams!(o, x, n=3, b=256)
Store codes of n
grams of x
using base b
to o
.
Examples
julia> o = zeros(Int, 5)
5-element Vector{Int64}:
0
0
0
0
0
julia> ngrams!(o, "foo", 3, 256)
5-element Vector{Int64}:
131686
157295
6713199
7302915
7275267
See also: ngrams
, countngrams
, countngrams!
, NGramMatrix
, NGramIterator
.
Mill.countngrams
— Functioncountngrams(o, x, n, b, m)
Count the number of of n
grams of x
using base b
and modulo m
into a vector of length m
in case x
is a single sequence or into a matrix with m
rows if x
is an iterable of sequences.
Examples
julia> countngrams("foo", 3, 256, 5)
5-element Vector{Int64}:
2
1
1
0
1
julia> countngrams(["foo", "bar"], 3, 256, 5)
5×2 Matrix{Int64}:
2 1
1 0
1 2
0 0
1 2
See also: countngrams!
, ngrams
, ngrams!
, NGramMatrix
, NGramIterator
.
Mill.countngrams!
— Functioncountngrams!(o, x, n, b, m=length(o))
Count the number of of n
grams of x
using base b
and modulo m
and store the result to o
.
Examples
julia> o = zeros(Int, 5)
5-element Vector{Int64}:
0
0
0
0
0
julia> countngrams!(o, "foo", 3, 256)
5-element Vector{Int64}:
2
1
1
0
1
See also: countngrams
, ngrams
, ngrams!
, NGramMatrix
, NGramIterator
.
Mill.NGramMatrix
— TypeNGramMatrix{T, U, V} <: AbstractMatrix{U}
A matrix-like structure for lazily representing sequences like strings as ngrams of cardinality n
using b
as a base for calculations and m
as the modulo. Therefore, the matrix has m
rows and one column for representing each sequence. Missing sequences are supported.
See also: NGramIterator
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.NGramMatrix
— MethodNGramMatrix(s, n=3, b=256, m=2053)
Construct an NGramMatrix
. s
can either be a single sequence or any AbstractVector
.
Examples
julia> NGramMatrix([1,2,3])
2053×1 NGramMatrix{Vector{Int64}, Vector{Vector{Int64}}, Int64}:
[1, 2, 3]
julia> NGramMatrix(["a", missing, "c"], 2, 128)
2053×3 NGramMatrix{Union{Missing, String}, Vector{Union{Missing, String}}, Union{Missing, Int64}}:
"a"
missing
"c"
See also: NGramIterator
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.PostImputingMatrix
— TypePostImputingMatrix{T <: Number, U <: AbstractMatrix{T}, V <: AbstractVector{T}} <: AbstractMatrix{T}
A parametrized matrix that fills in a default vector of parameters whenever a "missing" column is encountered during multiplication.
Supports multiplication with NGramMatrix
, MaybeHotMatrix
and MaybeHotVector
. For any other AbstractMatrix
it falls back to standard multiplication.
Examples
julia> A = PostImputingMatrix(ones(2, 2), -ones(2))
2×2 PostImputingMatrix{Float64, Matrix{Float64}, Vector{Float64}}:
W:
1.0 1.0
1.0 1.0
ψ:
-1.0
-1.0
julia> A * maybehotbatch([1, missing], 1:2)
2×2 Matrix{Float64}:
1.0 -1.0
1.0 -1.0
See also: PreImputingMatrix
.
Mill.PostImputingMatrix
— MethodPostImputingMatrix(W::AbstractMatrix{T}, ψ=zeros(T, size(W, 1))) where T
Construct a PostImputingMatrix
with multiplication parameters W
and default parameters ψ
.
Examples
julia> PostImputingMatrix([1 2; 3 4])
2×2 PostImputingMatrix{Int64, Matrix{Int64}, Vector{Int64}}:
W:
1 2
3 4
ψ:
0
0
See also: PreImputingMatrix
.
Mill.postimputing_dense
— Functionpostimputing_dense(d_in, d_out, σ)
Like Flux.Dense
, but use a PostImputingMatrix
instead of a standard matrix.
Examples
julia> d = postimputing_dense(3, 2)
[postimputing]Dense(3 => 2) 3 arrays, 10 params, 168 bytes
julia> typeof(d.weight)
PostImputingMatrix{Float32, Matrix{Float32}, Vector{Float32}}
julia> typeof(d.bias)
Vector{Float32} (alias for Array{Float32, 1})
See also: PostImputingMatrix
, preimputing_dense
, PreImputingMatrix
.
Mill.PreImputingMatrix
— TypePreImputingMatrix{T <: Number, U <: AbstractMatrix{T}, V <: AbstractVector{T}} <: AbstractMatrix{T}
A parametrized matrix that fills in elements from a default vector of parameters whenever a missing
element is encountered during multiplication.
Examples
julia> A = PreImputingMatrix(ones(2, 2), -ones(2))
2×2 PreImputingMatrix{Float64, Matrix{Float64}, Vector{Float64}}:
W:
1.0 1.0
1.0 1.0
ψ:
-1.0 -1.0
julia> A * [0 1; missing -1]
2×2 Matrix{Float64}:
-1.0 0.0
-1.0 0.0
See also: PreImputingMatrix
.
Mill.PreImputingMatrix
— MethodPreImputingMatrix(W::AbstractMatrix{T}, ψ=zeros(T, size(W, 2))) where T
Construct a PreImputingMatrix
with multiplication parameters W
and default parameters ψ
.
Examples
julia> PreImputingMatrix([1 2; 3 4])
2×2 PreImputingMatrix{Int64, Matrix{Int64}, Vector{Int64}}:
W:
1 2
3 4
ψ:
0 0
See also: PostImputingMatrix
.
Mill.preimputing_dense
— Functionpreimputing_dense(in, out, σ)
Like Flux.Dense
, but use a PreImputingMatrix
instead of a standard matrix.
Examples
julia> d = preimputing_dense(2, 3)
[preimputing]Dense(2 => 3) 3 arrays, 11 params, 172 bytes
julia> typeof(d.weight)
PreImputingMatrix{Float32, Matrix{Float32}, Vector{Float32}}
julia> typeof(d.bias)
Vector{Float32} (alias for Array{Float32, 1})
See also: PreImputingMatrix
, postimputing_dense
, PostImputingMatrix
.