Data nodes
Index
Mill.AbstractBagNode
Mill.AbstractMillNode
Mill.AbstractProductNode
Mill.ArrayNode
Mill.ArrayNode
Mill.BagNode
Mill.BagNode
Mill.LazyNode
Mill.LazyNode
Mill.ProductNode
Mill.ProductNode
Mill.WeightedBagNode
Mill.WeightedBagNode
Mill.catobs
Mill.data
Mill.datasummary
Mill.dropmeta
Mill.mapdata
Mill.metadata
Mill.removeinstances
Mill.subset
Mill.unpack2mill
API
Mill.AbstractMillNode
— TypeAbstractMillNode
Supertype for any structure representing a data node.
Mill.AbstractProductNode
— TypeAbstractProductNode <: AbstractMillNode
Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.
Mill.AbstractBagNode
— TypeAbstractBagNode <: AbstractMillNode
Supertype for any data node structure representing a multi-instance learning problem.
Mill.ArrayNode
— TypeArrayNode{A <: AbstractArray, C} <: AbstractMillNode
Data node for storing array-like data of type A
and metadata of type C
. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.
See also: AbstractMillNode
, ArrayModel
.
Mill.ArrayNode
— MethodArrayNode(d::AbstractArray, m=nothing)
Construct a new ArrayNode
with data d
and metadata m
.
Examples
julia> a = ArrayNode([1 2; 3 4; 5 6])
3×2 ArrayNode{Matrix{Int64}, Nothing}:
1 2
3 4
5 6
See also: AbstractMillNode
, ArrayModel
.
Mill.BagNode
— TypeBagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, C} <: AbstractBagNode
Data node that represents a multi-instance learning problem.
Contains instances stored in a subtree of type T
, bag indices of type B
and optional metadata of type C
.
See also: WeightedBagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.BagNode
— MethodBagNode(d, b, m=nothing)
Construct a new BagNode
with data d
, bags b
, and metadata m
.
d
is either an AbstractMillNode
or missing
. Any other type is wrapped in an ArrayNode
.
If b
is an AbstractVector
, Mill.bags
is applied first.
Examples
julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode 2 obs, 104 bytes
╰── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements) 3 obs, 87 bytes
julia> BagNode(randn(2, 5), [1, 2, 2, 1, 1])
BagNode 2 obs, 200 bytes
╰── ArrayNode(2×5 Array with Float64 elements) 5 obs, 128 bytes
See also: WeightedBagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.WeightedBagNode
— TypeWeightedBagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, W, C} <: AbstractBagNode
Structure like BagNode
but allows to specify weights of type W
of each instance.
See also: BagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.WeightedBagNode
— MethodWeightedBagNode(d, b, w::Vector, m=nothing)
Construct a new WeightedBagNode
with data d
, bags b
, vector of weights w
and metadata m
.
d
is either an AbstractMillNode
or missing
. Any other type is wrapped in an ArrayNode
.
If b
is an AbstractVector
, Mill.bags
is applied first.
Examples
julia> WeightedBagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
WeightedBagNode 2 obs, 184 bytes
╰── ArrayNode(2053×2 NGramMatrix with Int64 elements) 2 obs, 140 bytes
julia> WeightedBagNode(zeros(2, 2), [1, 2], [1, 2])
WeightedBagNode 2 obs, 160 bytes
╰── ArrayNode(2×2 Array with Float64 elements) 2 obs, 80 bytes
See also: BagNode
, AbstractBagNode
, AbstractMillNode
, BagModel
.
Mill.ProductNode
— TypeProductNode{T, C} <: AbstractProductNode
Data node representing a Cartesian product of several spaces each represented by subtree stored in iterable of type T
. May store metadata of type C
.
See also: AbstractProductNode
, AbstractMillNode
, ProductModel
.
Mill.ProductNode
— MethodProductNode(dss, m=nothing)
ProductNode(m=nothing; dss...)
Construct a new ProductNode
with data dss
, and metadata m
.
dss
should be a Tuple
or NamedTuple
and all its elements must contain the same number of observations.
If any element of dss
is not an AbstractMillNode
it is first wrapped in an ArrayNode
.
Examples
julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode 2 obs, 24 bytes
├── ArrayNode(2×2 Array with Float64 elements) 2 obs, 80 bytes
╰── ArrayNode(2×2 OneHotArray with Bool elements) 2 obs, 80 bytes
julia> ProductNode(x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
x2 = BagNode(ArrayNode([1 2; 3 4]), [1:2, 0:-1]))
ProductNode 2 obs, 48 bytes
├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements) 2 obs, 146 bytes
╰── x2: BagNode 2 obs, 96 bytes
╰── ArrayNode(2×2 Array with Int64 elements) 2 obs, 80 bytes
julia> ProductNode([1 2 3])
ProductNode 3 obs, 8 bytes
╰── ArrayNode(1×3 Array with Int64 elements) 3 obs, 72 bytes
julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1 2 3; 4 5 6])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]
See also: AbstractProductNode
, AbstractMillNode
, ProductModel
.
Mill.LazyNode
— TypeLazyNode{Name, D, C} <: AbstractMillNode
Data node storing data of type D
in a lazy manner and optional metadata of type C
.
Source of data or its type is specified in Name
.
See also: AbstractMillNode
, LazyModel
, Mill.unpack2mill
.
Mill.LazyNode
— MethodLazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)
Construct a new LazyNode
with name Name
, data d
, and metadata m
.
Examples
julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{:Codons, Vector{String}, Nothing}:
"GGGCGGCGA"
"CCTCGCGGG"
See also: AbstractMillNode
, LazyModel
, Mill.unpack2mill
.
Mill.unpack2mill
— FunctionMill.unpack2mill(x::LazyNode)
Return a representation of LazyNode
x
using Mill.jl
structures. Every custom LazyNode
should have a special method as it is used in LazyModel
.
Examples
julia> function Mill.unpack2mill(ds::LazyNode{:Sentence})
s = split.(ds.data, " ")
x = NGramMatrix(reduce(vcat, s))
BagNode(x, Mill.length2bags(length.(s)))
end;
julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode 2 obs, 120 bytes
╰── ArrayNode(2053×3 NGramMatrix with Int64 elements) 3 obs, 274 bytes
Mill.data
— FunctionMill.data(n::AbstractMillNode)
Return data stored in node n
.
Examples
julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Matrix{Int64}:
1 2
3 4
julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), [1, 2], "metadata"))
2×2 ArrayNode{Matrix{Int64}, Nothing}:
1 2
3 4
See also: Mill.metadata
Mill.metadata
— FunctionMill.metadata(n::AbstractMillNode)
Return metadata stored in node n
.
Examples
julia> Mill.metadata(ArrayNode([1 2; 3 4], "metadata"))
"metadata"
julia> Mill.metadata(BagNode(ArrayNode([1 2; 3 4]), [1, 2], "metadata"))
"metadata"
See also: Mill.data
Mill.datasummary
— Functiondatasummary(n::AbstractMillNode)
Print summary of parameters of node n
.
Examples
julia> n = ProductNode(ArrayNode(randn(2, 3)))
ProductNode 3 obs, 8 bytes
╰── ArrayNode(2×3 Array with Float64 elements) 3 obs, 96 bytes
julia> datasummary(n)
"Data summary: 3 obs, 112 bytes."
See also: modelsummary
.
Mill.dropmeta
— Functiondropmeta(n:AbstractMillNode)
Drop metadata stored in data node n
(recursively).
Examples
julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Vector{String}}:
"foo"
"bar"
julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
"foo"
"bar"
julia> isnothing(Mill.metadata(n2))
true
See also: Mill.metadata
.
Mill.catobs
— Functioncatobs(ns...)
Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.
Similar to Base.cat
but concatenates along the abstract "axis" where samples are stored.
In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...])
to save compilation time.
Examples
julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Matrix{Float64}, Nothing}:
0.0 0.0 1.0 2.0
0.0 0.0 3.0 4.0
julia> n = ProductNode(t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8])))
ProductNode 3 obs, 24 bytes
├── t1: ArrayNode(2×3 Array with Float64 elements) 3 obs, 96 bytes
╰── t2: BagNode 3 obs, 112 bytes
╰── ArrayNode(3×8 Array with Float64 elements) 8 obs, 240 bytes
julia> catobs(n[1], n[3])
ProductNode 2 obs, 24 bytes
├── t1: ArrayNode(2×2 Array with Float64 elements) 2 obs, 80 bytes
╰── t2: BagNode 2 obs, 96 bytes
╰── ArrayNode(3×6 Array with Float64 elements) 6 obs, 192 bytes
See also: Mill.subset
.
Mill.subset
— Functionsubset(n, i)
Extract a subset i
of samples (observations) stored in node n
.
Similar to Base.getindex
or MLUtils.getobs
but defined for all Mill.jl
compatible data as well.
Examples
julia> Mill.subset(ArrayNode(NGramMatrix(["Hello", "world"])), 2)
2053×1 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
"world"
julia> Mill.subset(BagNode(ArrayNode(randn(2, 8)), [1:2, 3:3, 4:7, 8:8]), 1:3)
BagNode 3 obs, 112 bytes
╰── ArrayNode(2×7 Array with Float64 elements) 7 obs, 160 bytes
See also: catobs
.
Mill.mapdata
— Functionmapdata(f, x)
Recursively apply f
to data in all leaves of x
.
Examples
julia> n1 = ProductNode(a=zeros(2,2), b=ones(2,2))
ProductNode 2 obs, 16 bytes
├── a: ArrayNode(2×2 Array with Float64 elements) 2 obs, 80 bytes
╰── b: ArrayNode(2×2 Array with Float64 elements) 2 obs, 80 bytes
julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode 2 obs, 16 bytes
├── a: ArrayNode(2×2 Array with Float64 elements) 2 obs, 80 bytes
╰── b: ArrayNode(2×2 Array with Float64 elements) 2 obs, 80 bytes
julia> Mill.data(n2).a
2×2 ArrayNode{Matrix{Float64}, Nothing}:
1.0 1.0
1.0 1.0
julia> Mill.data(n2).b
2×2 ArrayNode{Matrix{Float64}, Nothing}:
2.0 2.0
2.0 2.0
Mill.removeinstances
— Functionremoveinstances(n::AbstractBagNode, mask)
Remove instances from n
using mask
and remap bag indices accordingly.
Examples
julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode 3 obs, 112 bytes
╰── ArrayNode(2×3 Array with Int64 elements) 3 obs, 96 bytes
julia> b2 = removeinstances(b1, [false, true, true])
BagNode 3 obs, 112 bytes
╰── ArrayNode(2×2 Array with Int64 elements) 2 obs, 80 bytes
julia> b2.data
2×2 ArrayNode{Matrix{Int64}, Nothing}:
2 3
5 6
julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])