Data nodes
Index
Mill.AbstractBagNode
Mill.AbstractMillNode
Mill.AbstractProductNode
Mill.ArrayNode
Mill.ArrayNode
Mill.BagNode
Mill.BagNode
Mill.LazyNode
Mill.LazyNode
Mill.ProductNode
Mill.ProductNode
Mill.WeightedBagNode
Mill.WeightedBagNode
Mill.catobs
Mill.data
Mill.datasummary
Mill.dropmeta
Mill.mapdata
Mill.metadata
Mill.metadata_getindex
Mill.removeinstances
Mill.unpack2mill
API
Mill.AbstractMillNode
— TypeAbstractMillNode
Supertype for any structure representing a data node.
Mill.AbstractProductNode
— TypeAbstractProductNode <: AbstractMillNode
Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.
Mill.AbstractBagNode
— TypeAbstractBagNode <: AbstractMillNode
Supertype for any data node structure representing a multi-instance learning problem.
Mill.ArrayNode
— TypeArrayNode{A <: AbstractArray, C} <: AbstractMillNode
Data node for storing array-like data of type A
and metadata of type C
. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.
See also: AbstractMillNode
, ArrayModel
.
Mill.ArrayNode
— MethodArrayNode(d::AbstractArray, m=nothing)
Construct a new ArrayNode
with data d
and metadata m
.
Examples
julia> a = ArrayNode([1 2; 3 4; 5 6])
3×2 ArrayNode{Matrix{Int64}, Nothing}:
1 2
3 4
5 6
See also: AbstractMillNode
, ArrayModel
.
Mill.BagNode
— TypeBagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, C} <: AbstractBagNode
Data node that represents a multi-instance learning problem.
Contains instances stored in a subtree of type T
, bag indices of type B
and optional metadata of type C
.
See also: WeightedBagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.BagNode
— MethodBagNode(d, b, m=nothing)
Construct a new BagNode
with data d
, bags b
, and metadata m
.
d
is either an AbstractMillNode
or missing
. Any other type is wrapped in an ArrayNode
.
If b
is an AbstractVector
, Mill.bags
is applied first.
Examples
julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode 2 obs
╰── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements) 3 obs
julia> BagNode(randn(2, 5), [1, 2, 2, 1, 1])
BagNode 2 obs
╰── ArrayNode(2×5 Array with Float64 elements) 5 obs
See also: WeightedBagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.WeightedBagNode
— TypeWeightedBagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, W, C} <: AbstractBagNode
Structure like BagNode
but allows to specify weights of type W
of each instance.
See also: BagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.WeightedBagNode
— MethodWeightedBagNode(d, b, w::Vector, m=nothing)
Construct a new WeightedBagNode
with data d
, bags b
, vector of weights w
and metadata m
.
d
is either an AbstractMillNode
or missing
. Any other type is wrapped in an ArrayNode
.
If b
is an AbstractVector
, Mill.bags
is applied first.
Examples
julia> WeightedBagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
WeightedBagNode 2 obs
╰── ArrayNode(2053×2 NGramMatrix with Int64 elements) 2 obs
julia> WeightedBagNode(zeros(2, 2), [1, 2], [1, 2])
WeightedBagNode 2 obs
╰── ArrayNode(2×2 Array with Float64 elements) 2 obs
See also: BagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.ProductNode
— TypeProductNode{T, C} <: AbstractProductNode
Data node representing a Cartesian product of several spaces each represented by subtree stored in iterable of type T
. May store metadata of type C
.
See also: AbstractProductNode
, AbstractMillNode
, ProductModel
.
Mill.ProductNode
— MethodProductNode(dss, m=nothing)
ProductNode(m=nothing; dss...)
Construct a new ProductNode
with data dss
, and metadata m
.
dss
should be a Tuple
or NamedTuple
and all its elements must contain the same number of observations.
If any element of dss
is not an AbstractMillNode
it is first wrapped in an ArrayNode
.
Examples
julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode 2 obs
├── ArrayNode(2×2 Array with Float64 elements) 2 obs
╰── ArrayNode(2×2 OneHotArray with Bool elements) 2 obs
julia> ProductNode(x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
x2 = BagNode(ArrayNode([1 2; 3 4]), [1:2, 0:-1]))
ProductNode 2 obs
├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements) 2 obs
╰── x2: BagNode 2 obs
╰── ArrayNode(2×2 Array with Int64 elements) 2 obs
julia> ProductNode([1 2 3])
ProductNode 3 obs
╰── ArrayNode(1×3 Array with Int64 elements) 3 obs
julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1 2 3; 4 5 6])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]
See also: AbstractProductNode
, AbstractMillNode
, ProductModel
.
Mill.LazyNode
— TypeLazyNode{Name, D, C} <: AbstractMillNode
Data node storing data of type D
in a lazy manner and optional metadata of type C
.
Source of data or its type is specified in Name
.
See also: AbstractMillNode
, LazyModel
, Mill.unpack2mill
.
Mill.LazyNode
— MethodLazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)
Construct a new LazyNode
with name Name
, data d
, and metadata m
.
Examples
julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{:Codons, Vector{String}, Nothing}:
"GGGCGGCGA"
"CCTCGCGGG"
See also: AbstractMillNode
, LazyModel
, Mill.unpack2mill
.
Mill.unpack2mill
— FunctionMill.unpack2mill(x::LazyNode)
Return a representation of LazyNode
x
using Mill.jl
structures. Every custom LazyNode
should have a special method as it is used in LazyModel
.
Examples
julia> function Mill.unpack2mill(ds::LazyNode{:Sentence})
s = split.(ds.data, " ")
x = NGramMatrix(reduce(vcat, s))
BagNode(x, Mill.length2bags(length.(s)))
end;
julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode 2 obs
╰── ArrayNode(2053×3 NGramMatrix with Int64 elements) 3 obs
Mill.data
— FunctionMill.data(n::AbstractMillNode)
Return data stored in node n
.
Examples
julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Matrix{Int64}:
1 2
3 4
julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), [1, 2], "metadata"))
2×2 ArrayNode{Matrix{Int64}, Nothing}:
1 2
3 4
See also: Mill.metadata
Mill.metadata
— FunctionMill.metadata(n::AbstractMillNode)
Return metadata stored in node n
.
Examples
julia> Mill.metadata(ArrayNode([1 2; 3 4], ["foo", "bar"]))
2-element Vector{String}:
"foo"
"bar"
julia> Mill.metadata(BagNode(ArrayNode([1 2; 3 4]), [1, 2], ["metadata"]))
1-element Vector{String}:
"metadata"
See also: Mill.data
, Mill.dropmeta
, Mill.metadata_getindex
.
Mill.datasummary
— Functiondatasummary(n::AbstractMillNode)
Print summary of parameters of node n
.
Examples
julia> n = ProductNode(ArrayNode(randn(2, 3)))
ProductNode 3 obs
╰── ArrayNode(2×3 Array with Float64 elements) 3 obs
julia> datasummary(n)
"Data summary: 3 obs, 104 bytes."
See also: modelsummary
.
Mill.dropmeta
— Functiondropmeta(n:AbstractMillNode)
Drop metadata stored in data node n
(recursively).
Examples
julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Vector{String}}:
"foo"
"bar"
julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
"foo"
"bar"
julia> isnothing(Mill.metadata(n2))
true
See also: Mill.metadata
, Mill.metadata_getindex
.
Mill.catobs
— Functioncatobs(ns...)
Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.
Similar to Base.cat
but concatenates along the abstract "axis" where samples are stored.
In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...])
to save compilation time.
Examples
julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Matrix{Float64}, Nothing}:
0.0 0.0 1.0 2.0
0.0 0.0 3.0 4.0
julia> n = ProductNode(t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8])))
ProductNode 3 obs
├── t1: ArrayNode(2×3 Array with Float64 elements) 3 obs
╰── t2: BagNode 3 obs
╰── ArrayNode(3×8 Array with Float64 elements) 8 obs
julia> catobs(n[1], n[3])
ProductNode 2 obs
├── t1: ArrayNode(2×2 Array with Float64 elements) 2 obs
╰── t2: BagNode 2 obs
╰── ArrayNode(3×6 Array with Float64 elements) 6 obs
Mill.metadata_getindex
— Functionmetadata_getindex(x, i::Integer)
metadata_getindex(x, i::VecOrRange{<:Integer})
Index into metadata x
. In Mill.jl
, it is assumed that the second or last dimension indexes into observations, whichever is smaller. This function can be used when implementing custom subtypes of AbstractMillNode
.
Examples
julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2)
"bar"
julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2:3)
2-element Vector{String}:
"bar"
"baz"
julia> Mill.metadata_getindex([1 2 3; 4 5 6], 2)
2-element Vector{Int64}:
2
5
julia> Mill.metadata_getindex([1 2 3; 4 5 6], [1, 3])
2×2 Matrix{Int64}:
1 3
4 6
See also: Mill.metadata
, Mill.dropmeta
.
Mill.mapdata
— Functionmapdata(f, x)
Recursively apply f
to data in all leaves of x
.
Examples
julia> n1 = ProductNode(a=zeros(2,2), b=ones(2,2))
ProductNode 2 obs
├── a: ArrayNode(2×2 Array with Float64 elements) 2 obs
╰── b: ArrayNode(2×2 Array with Float64 elements) 2 obs
julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode 2 obs
├── a: ArrayNode(2×2 Array with Float64 elements) 2 obs
╰── b: ArrayNode(2×2 Array with Float64 elements) 2 obs
julia> Mill.data(n2).a
2×2 ArrayNode{Matrix{Float64}, Nothing}:
1.0 1.0
1.0 1.0
julia> Mill.data(n2).b
2×2 ArrayNode{Matrix{Float64}, Nothing}:
2.0 2.0
2.0 2.0
Mill.removeinstances
— Functionremoveinstances(n::AbstractBagNode, mask)
Remove instances from n
using mask
and remap bag indices accordingly.
Examples
julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode 3 obs
╰── ArrayNode(2×3 Array with Int64 elements) 3 obs
julia> b2 = removeinstances(b1, [false, true, true])
BagNode 3 obs
╰── ArrayNode(2×2 Array with Int64 elements) 2 obs
julia> b2.data
2×2 ArrayNode{Matrix{Int64}, Nothing}:
2 3
5 6
julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])