Data nodes
Index
Mill.AbstractBagNodeMill.AbstractMillNodeMill.AbstractProductNodeMill.ArrayNodeMill.ArrayNodeMill.BagNodeMill.BagNodeMill.LazyNodeMill.LazyNodeMill.ProductNodeMill.ProductNodeMill.WeightedBagNodeMill.WeightedBagNodeMill.catobsMill.dataMill.datasummaryMill.dropmetaMill.mapdataMill.metadataMill.metadata_getindexMill.removeinstancesMill.unpack2mill
API
Mill.AbstractMillNode — TypeAbstractMillNodeSupertype for any structure representing a data node.
Mill.AbstractProductNode — TypeAbstractProductNode <: AbstractMillNodeSupertype for any structure representing a data node implementing a Cartesian product of data in subtrees.
Mill.AbstractBagNode — TypeAbstractBagNode <: AbstractMillNodeSupertype for any data node structure representing a multi-instance learning problem.
Mill.ArrayNode — TypeArrayNode{A <: AbstractArray, C} <: AbstractMillNodeData node for storing array-like data of type A and metadata of type C. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.
See also: AbstractMillNode, ArrayModel.
Mill.ArrayNode — MethodArrayNode(d::AbstractArray, m=nothing)Construct a new ArrayNode with data d and metadata m.
Examples
julia> a = ArrayNode([1 2; 3 4; 5 6])
3×2 ArrayNode{Matrix{Int64}, Nothing}:
 1  2
 3  4
 5  6See also: AbstractMillNode, ArrayModel.
Mill.BagNode — TypeBagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, C} <: AbstractBagNodeData node that represents a multi-instance learning problem.
Contains instances stored in a subtree of type T, bag indices of type B and optional metadata of type C.
See also: WeightedBagNode, AbstractBagNode,     AbstractMillNode, BagModel.
Mill.BagNode — MethodBagNode(d, b, m=nothing)Construct a new BagNode with data d, bags b, and metadata m.
d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.
If b is an AbstractVector, Mill.bags is applied first.
Examples
julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode  2 obs
  ╰── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements)  3 obs
julia> BagNode(randn(2, 5), [1, 2, 2, 1, 1])
BagNode  2 obs
  ╰── ArrayNode(2×5 Array with Float64 elements)  5 obsSee also: WeightedBagNode, AbstractBagNode,     AbstractMillNode, BagModel.
Mill.WeightedBagNode — TypeWeightedBagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, W, C} <: AbstractBagNodeStructure like BagNode but allows to specify weights of type W of each instance.
See also: BagNode, AbstractBagNode, AbstractMillNode, BagModel.
Mill.WeightedBagNode — MethodWeightedBagNode(d, b, w::Vector, m=nothing)Construct a new WeightedBagNode with data d, bags b, vector of weights w and metadata m.
d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.
If b is an AbstractVector, Mill.bags is applied first.
Examples
julia> WeightedBagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
WeightedBagNode  2 obs
  ╰── ArrayNode(2053×2 NGramMatrix with Int64 elements)  2 obs
julia> WeightedBagNode(zeros(2, 2), [1, 2], [1, 2])
WeightedBagNode  2 obs
  ╰── ArrayNode(2×2 Array with Float64 elements)  2 obsSee also: BagNode, AbstractBagNode, AbstractMillNode, BagModel.
Mill.ProductNode — TypeProductNode{T, C} <: AbstractProductNodeData node representing a Cartesian product of several spaces each represented by subtree stored in iterable of type T. May store metadata of type C.
See also: AbstractProductNode, AbstractMillNode, ProductModel.
Mill.ProductNode — MethodProductNode(dss, m=nothing)
ProductNode(m=nothing; dss...)Construct a new ProductNode with data dss, and metadata m.
dss should be a Tuple or NamedTuple and all its elements must contain the same number of observations.
If any element of dss is not an AbstractMillNode it is first wrapped in an ArrayNode.
Examples
julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode  2 obs
  ├── ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── ArrayNode(2×2 OneHotArray with Bool elements)  2 obs
julia> ProductNode(x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
                   x2 = BagNode(ArrayNode([1 2; 3 4]), [1:2, 0:-1]))
ProductNode  2 obs
  ├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements)  2 obs
  ╰── x2: BagNode  2 obs
            ╰── ArrayNode(2×2 Array with Int64 elements)  2 obs
julia> ProductNode([1 2 3])
ProductNode  3 obs
  ╰── ArrayNode(1×3 Array with Int64 elements)  3 obs
julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1 2 3; 4 5 6])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]See also: AbstractProductNode, AbstractMillNode, ProductModel.
Mill.LazyNode — TypeLazyNode{Name, D, C} <: AbstractMillNodeData node storing data of type D in a lazy manner and optional metadata of type C.
Source of data or its type is specified in Name.
See also: AbstractMillNode, LazyModel, Mill.unpack2mill.
Mill.LazyNode — MethodLazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)Construct a new LazyNode with name Name, data d, and metadata m.
Examples
julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{:Codons, Vector{String}, Nothing}:
 "GGGCGGCGA"
 "CCTCGCGGG"See also: AbstractMillNode, LazyModel, Mill.unpack2mill.
Mill.unpack2mill — FunctionMill.unpack2mill(x::LazyNode)Return a representation of LazyNode x using Mill.jl structures. Every custom LazyNode should have a special method as it is used in LazyModel.
Examples
julia> function Mill.unpack2mill(ds::LazyNode{:Sentence})
    s = split.(ds.data, " ")
    x = NGramMatrix(reduce(vcat, s))
    BagNode(x, Mill.length2bags(length.(s)))
end;julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode  2 obs
  ╰── ArrayNode(2053×3 NGramMatrix with Int64 elements)  3 obsMill.data — FunctionMill.data(n::AbstractMillNode)Return data stored in node n.
Examples
julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Matrix{Int64}:
 1  2
 3  4
julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), [1, 2], "metadata"))
2×2 ArrayNode{Matrix{Int64}, Nothing}:
 1  2
 3  4See also: Mill.metadata
Mill.metadata — FunctionMill.metadata(n::AbstractMillNode)Return metadata stored in node n.
Examples
julia> Mill.metadata(ArrayNode([1 2; 3 4], ["foo", "bar"]))
2-element Vector{String}:
 "foo"
 "bar"
julia> Mill.metadata(BagNode(ArrayNode([1 2; 3 4]), [1, 2], ["metadata"]))
1-element Vector{String}:
 "metadata"See also: Mill.data, Mill.dropmeta, Mill.metadata_getindex.
Mill.datasummary — Functiondatasummary(n::AbstractMillNode)Print summary of parameters of node n.
Examples
julia> n = ProductNode(ArrayNode(randn(2, 3)))
ProductNode  3 obs
  ╰── ArrayNode(2×3 Array with Float64 elements)  3 obs
julia> datasummary(n)
"Data summary: 3 obs, 104 bytes."See also: modelsummary.
Mill.dropmeta — Functiondropmeta(n:AbstractMillNode)Drop metadata stored in data node n (recursively).
Examples
julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Vector{String}}:
 "foo"
 "bar"
julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
 "foo"
 "bar"
julia> isnothing(Mill.metadata(n2))
trueSee also: Mill.metadata, Mill.metadata_getindex.
Mill.catobs — Functioncatobs(ns...)Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.
Similar to Base.cat but concatenates along the abstract "axis" where samples are stored.
In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...]) to save compilation time.
Examples
julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Matrix{Float64}, Nothing}:
 0.0  0.0  1.0  2.0
 0.0  0.0  3.0  4.0
julia> n = ProductNode(t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8])))
ProductNode  3 obs
  ├── t1: ArrayNode(2×3 Array with Float64 elements)  3 obs
  ╰── t2: BagNode  3 obs
            ╰── ArrayNode(3×8 Array with Float64 elements)  8 obs
julia> catobs(n[1], n[3])
ProductNode  2 obs
  ├── t1: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── t2: BagNode  2 obs
            ╰── ArrayNode(3×6 Array with Float64 elements)  6 obsMill.metadata_getindex — Functionmetadata_getindex(x, i::Integer)
metadata_getindex(x, i::VecOrRange{<:Integer})Index into metadata x. In Mill.jl, it is assumed that the second or last dimension indexes into observations, whichever is smaller. This function can be used when implementing custom subtypes of AbstractMillNode.
Examples
julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2)
"bar"
julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2:3)
2-element Vector{String}:
 "bar"
 "baz"
julia> Mill.metadata_getindex([1 2 3; 4 5 6], 2)
2-element Vector{Int64}:
 2
 5
julia> Mill.metadata_getindex([1 2 3; 4 5 6], [1, 3])
2×2 Matrix{Int64}:
 1  3
 4  6See also: Mill.metadata, Mill.dropmeta.
Mill.mapdata — Functionmapdata(f, x)Recursively apply f to data in all leaves of x.
Examples
julia> n1 = ProductNode(a=zeros(2,2), b=ones(2,2))
ProductNode  2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  2 obs
julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode  2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  2 obs
julia> Mill.data(n2).a
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 1.0  1.0
 1.0  1.0
julia> Mill.data(n2).b
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 2.0  2.0
 2.0  2.0Mill.removeinstances — Functionremoveinstances(n::AbstractBagNode, mask)Remove instances from n using mask and remap bag indices accordingly.
Examples
julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode  3 obs
  ╰── ArrayNode(2×3 Array with Int64 elements)  3 obs
julia> b2 = removeinstances(b1, [false, true, true])
BagNode  3 obs
  ╰── ArrayNode(2×2 Array with Int64 elements)  2 obs
julia> b2.data
2×2 ArrayNode{Matrix{Int64}, Nothing}:
 2  3
 5  6
julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])