Data nodes

Index

API

Mill.AbstractProductNodeType
AbstractProductNode <: AbstractMillNode

Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.

source
Mill.AbstractBagNodeType
AbstractBagNode <: AbstractMillNode

Supertype for any data node structure representing a multi-instance learning problem.

source
Mill.ArrayNodeType
ArrayNode{A <: AbstractArray, C} <: AbstractMillNode

Data node for storing array-like data of type A and metadata of type C. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.

See also: AbstractMillNode, ArrayModel.

source
Mill.BagNodeMethod
BagNode(d, b, m=nothing)

Construct a new BagNode with data d, bags b, and metadata m.

d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.

If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode  2 obs
  ╰── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements)  3 obs

julia> BagNode(randn(2, 5), [1, 2, 2, 1, 1])
BagNode  2 obs
  ╰── ArrayNode(2×5 Array with Float64 elements)  5 obs

See also: WeightedBagNode, AbstractBagNode, AbstractMillNode, BagModel.

source
Mill.WeightedBagNodeMethod
WeightedBagNode(d, b, w::Vector, m=nothing)

Construct a new WeightedBagNode with data d, bags b, vector of weights w and metadata m.

d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.

If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> WeightedBagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
WeightedBagNode  2 obs
  ╰── ArrayNode(2053×2 NGramMatrix with Int64 elements)  2 obs

julia> WeightedBagNode(zeros(2, 2), [1, 2], [1, 2])
WeightedBagNode  2 obs
  ╰── ArrayNode(2×2 Array with Float64 elements)  2 obs

See also: BagNode, AbstractBagNode, AbstractMillNode, BagModel.

source
Mill.ProductNodeMethod
ProductNode(dss, m=nothing)
ProductNode(m=nothing; dss...)

Construct a new ProductNode with data dss, and metadata m.

dss should be a Tuple or NamedTuple and all its elements must contain the same number of observations.

If any element of dss is not an AbstractMillNode it is first wrapped in an ArrayNode.

Examples

julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode  2 obs
  ├── ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── ArrayNode(2×2 OneHotArray with Bool elements)  2 obs

julia> ProductNode(x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
                   x2 = BagNode(ArrayNode([1 2; 3 4]), [1:2, 0:-1]))
ProductNode  2 obs
  ├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements)  2 obs
  ╰── x2: BagNode  2 obs
            ╰── ArrayNode(2×2 Array with Int64 elements)  2 obs

julia> ProductNode([1 2 3])
ProductNode  3 obs
  ╰── ArrayNode(1×3 Array with Int64 elements)  3 obs

julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1 2 3; 4 5 6])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]

See also: AbstractProductNode, AbstractMillNode, ProductModel.

source
Mill.LazyNodeMethod
LazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)

Construct a new LazyNode with name Name, data d, and metadata m.

Examples

julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{:Codons, Vector{String}, Nothing}:
 "GGGCGGCGA"
 "CCTCGCGGG"

See also: AbstractMillNode, LazyModel, Mill.unpack2mill.

source
Mill.unpack2millFunction
Mill.unpack2mill(x::LazyNode)

Return a representation of LazyNode x using Mill.jl structures. Every custom LazyNode should have a special method as it is used in LazyModel.

Examples

julia> function Mill.unpack2mill(ds::LazyNode{:Sentence})
    s = split.(ds.data, " ")
    x = NGramMatrix(reduce(vcat, s))
    BagNode(x, Mill.length2bags(length.(s)))
end;
julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode  2 obs
  ╰── ArrayNode(2053×3 NGramMatrix with Int64 elements)  3 obs

See also: LazyNode, LazyModel.

source
Mill.dataFunction
Mill.data(n::AbstractMillNode)

Return data stored in node n.

Examples

julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Matrix{Int64}:
 1  2
 3  4

julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), [1, 2], "metadata"))
2×2 ArrayNode{Matrix{Int64}, Nothing}:
 1  2
 3  4

See also: Mill.metadata

source
Mill.metadataFunction
Mill.metadata(n::AbstractMillNode)

Return metadata stored in node n.

Examples

julia> Mill.metadata(ArrayNode([1 2; 3 4], ["foo", "bar"]))
2-element Vector{String}:
 "foo"
 "bar"

julia> Mill.metadata(BagNode(ArrayNode([1 2; 3 4]), [1, 2], ["metadata"]))
1-element Vector{String}:
 "metadata"

See also: Mill.data, Mill.dropmeta, Mill.metadata_getindex.

source
Mill.datasummaryFunction
datasummary(n::AbstractMillNode)

Print summary of parameters of node n.

Examples

julia> n = ProductNode(ArrayNode(randn(2, 3)))
ProductNode  3 obs
  ╰── ArrayNode(2×3 Array with Float64 elements)  3 obs

julia> datasummary(n)
"Data summary: 3 obs, 104 bytes."

See also: modelsummary.

source
Mill.dropmetaFunction
dropmeta(n:AbstractMillNode)

Drop metadata stored in data node n (recursively).

Examples

julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Vector{String}}:
 "foo"
 "bar"

julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
 "foo"
 "bar"

julia> isnothing(Mill.metadata(n2))
true

See also: Mill.metadata, Mill.metadata_getindex.

source
Mill.catobsFunction
catobs(ns...)

Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.

Similar to Base.cat but concatenates along the abstract "axis" where samples are stored.

In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...]) to save compilation time.

Examples

julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Matrix{Float64}, Nothing}:
 0.0  0.0  1.0  2.0
 0.0  0.0  3.0  4.0

julia> n = ProductNode(t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8])))
ProductNode  3 obs
  ├── t1: ArrayNode(2×3 Array with Float64 elements)  3 obs
  ╰── t2: BagNode  3 obs
            ╰── ArrayNode(3×8 Array with Float64 elements)  8 obs

julia> catobs(n[1], n[3])
ProductNode  2 obs
  ├── t1: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── t2: BagNode  2 obs
            ╰── ArrayNode(3×6 Array with Float64 elements)  6 obs
source
Mill.metadata_getindexFunction
metadata_getindex(x, i::Integer)
metadata_getindex(x, i::VecOrRange{<:Integer})

Index into metadata x. In Mill.jl, it is assumed that the second or last dimension indexes into observations, whichever is smaller. This function can be used when implementing custom subtypes of AbstractMillNode.

Examples

julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2)
"bar"

julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2:3)
2-element Vector{String}:
 "bar"
 "baz"

julia> Mill.metadata_getindex([1 2 3; 4 5 6], 2)
2-element Vector{Int64}:
 2
 5

julia> Mill.metadata_getindex([1 2 3; 4 5 6], [1, 3])
2×2 Matrix{Int64}:
 1  3
 4  6

See also: Mill.metadata, Mill.dropmeta.

source
Mill.mapdataFunction
mapdata(f, x)

Recursively apply f to data in all leaves of x.

Examples

julia> n1 = ProductNode(a=zeros(2,2), b=ones(2,2))
ProductNode  2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  2 obs

julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode  2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  2 obs

julia> Mill.data(n2).a
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 1.0  1.0
 1.0  1.0

julia> Mill.data(n2).b
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 2.0  2.0
 2.0  2.0
source
Mill.removeinstancesFunction
removeinstances(n::AbstractBagNode, mask)

Remove instances from n using mask and remap bag indices accordingly.

Examples

julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode  3 obs
  ╰── ArrayNode(2×3 Array with Int64 elements)  3 obs

julia> b2 = removeinstances(b1, [false, true, true])
BagNode  3 obs
  ╰── ArrayNode(2×2 Array with Int64 elements)  2 obs

julia> b2.data
2×2 ArrayNode{Matrix{Int64}, Nothing}:
 2  3
 5  6

julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])
source