Data nodes

Index

API

Mill.AbstractProductNodeType
AbstractProductNode <: AbstractMillNode

Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.

source
Mill.AbstractBagNodeType
AbstractBagNode <: AbstractMillNode

Supertype for any data node structure representing a multi-instance learning problem.

source
Mill.ArrayNodeType
ArrayNode{A <: AbstractArray, C} <: AbstractMillNode

Data node for storing array-like data of type A and metadata of type C. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.

See also: AbstractMillNode, ArrayModel.

source
Mill.BagNodeMethod
BagNode(d, b, m=nothing)

Construct a new BagNode with data d, bags b, and metadata m.

d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.

If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode  # 2 obs, 104 bytes
  ╰── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements)  # 3 obs, 87 bytes

julia> BagNode(randn(2, 5), [1, 2, 2, 1, 1])
BagNode  # 2 obs, 200 bytes
  ╰── ArrayNode(2×5 Array with Float64 elements)  # 5 obs, 128 bytes

See also: WeightedBagNode, AbstractBagNode, AbstractMillNode, BagModel.

source
Mill.WeightedBagNodeMethod
WeightedBagNode(d, b, w::Vector, m=nothing)

Construct a new WeightedBagNode with data d, bags b, vector of weights w and metadata m.

d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.

If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> WeightedBagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
WeightedBagNode  # 2 obs, 184 bytes
  ╰── ArrayNode(2053×2 NGramMatrix with Int64 elements)  # 2 obs, 140 bytes

julia> WeightedBagNode(zeros(2, 2), [1, 2], [1, 2])
WeightedBagNode  # 2 obs, 160 bytes
  ╰── ArrayNode(2×2 Array with Float64 elements)  # 2 obs, 80 bytes

See also: BagNode, AbstractBagNode, AbstractMillNode, BagModel.

source
Mill.ProductNodeMethod
ProductNode(dss, m=nothing)
ProductNode(m=nothing; dss...)

Construct a new ProductNode with data dss, and metadata m.

dss should be a Tuple or NamedTuple and all its elements must contain the same number of observations.

If any element of dss is not an AbstractMillNode it is first wrapped in an ArrayNode.

Examples

julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode  # 2 obs, 24 bytes
  ├── ArrayNode(2×2 Array with Float64 elements)  # 2 obs, 80 bytes
  ╰── ArrayNode(2×2 OneHotArray with Bool elements)  # 2 obs, 80 bytes

julia> ProductNode(x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
                   x2 = BagNode(ArrayNode([1 2; 3 4]), [1:2, 0:-1]))
ProductNode  # 2 obs, 48 bytes
  ├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements)  # 2 obs, 146 bytes
  ╰── x2: BagNode  # 2 obs, 96 bytes
            ╰── ArrayNode(2×2 Array with Int64 elements)  # 2 obs, 80 bytes

julia> ProductNode([1 2 3])
ProductNode  # 3 obs, 8 bytes
  ╰── ArrayNode(1×3 Array with Int64 elements)  # 3 obs, 72 bytes

julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1 2 3; 4 5 6])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]

See also: AbstractProductNode, AbstractMillNode, ProductModel.

source
Mill.LazyNodeMethod
LazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)

Construct a new LazyNode with name Name, data d, and metadata m.

Examples

julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{:Codons, Vector{String}, Nothing}:
 "GGGCGGCGA"
 "CCTCGCGGG"

See also: AbstractMillNode, LazyModel, Mill.unpack2mill.

source
Mill.unpack2millFunction
Mill.unpack2mill(x::LazyNode)

Return a representation of LazyNode x using Mill.jl structures. Every custom LazyNode should have a special method as it is used in LazyModel.

Examples

julia> function Mill.unpack2mill(ds::LazyNode{:Sentence})
    s = split.(ds.data, " ")
    x = NGramMatrix(reduce(vcat, s))
    BagNode(x, Mill.length2bags(length.(s)))
end;
julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode  # 2 obs, 120 bytes
  ╰── ArrayNode(2053×3 NGramMatrix with Int64 elements)  # 3 obs, 274 bytes

See also: LazyNode, LazyModel.

source
Mill.dataFunction
Mill.data(n::AbstractMillNode)

Return data stored in node n.

Examples

julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Matrix{Int64}:
 1  2
 3  4

julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), [1, 2], "metadata"))
2×2 ArrayNode{Matrix{Int64}, Nothing}:
 1  2
 3  4

See also: Mill.metadata

source
Mill.metadataFunction
Mill.metadata(n::AbstractMillNode)

Return metadata stored in node n.

Examples

julia> Mill.metadata(ArrayNode([1 2; 3 4], "metadata"))
"metadata"

julia> Mill.metadata(BagNode(ArrayNode([1 2; 3 4]), [1, 2], "metadata"))
"metadata"

See also: Mill.data

source
Mill.datasummaryFunction
datasummary(n::AbstractMillNode)

Print summary of parameters of node n.

Examples

julia> n = ProductNode(ArrayNode(randn(2, 3)))
ProductNode  # 3 obs, 8 bytes
  ╰── ArrayNode(2×3 Array with Float64 elements)  # 3 obs, 96 bytes

julia> datasummary(n)
"Data summary: 3 obs, 112 bytes."

See also: modelsummary.

source
Mill.dropmetaFunction
dropmeta(n:AbstractMillNode)

Drop metadata stored in data node n (recursively).

Examples

julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Vector{String}}:
 "foo"
 "bar"

julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
 "foo"
 "bar"

julia> isnothing(Mill.metadata(n2))
true

See also: Mill.metadata.

source
Mill.catobsFunction
catobs(ns...)

Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.

Similar to Base.cat but concatenates along the abstract "axis" where samples are stored.

In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...]) to save compilation time.

Examples

julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Matrix{Float64}, Nothing}:
 0.0  0.0  1.0  2.0
 0.0  0.0  3.0  4.0

julia> n = ProductNode(t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8])))
ProductNode  # 3 obs, 24 bytes
  ├── t1: ArrayNode(2×3 Array with Float64 elements)  # 3 obs, 96 bytes
  ╰── t2: BagNode  # 3 obs, 112 bytes
            ╰── ArrayNode(3×8 Array with Float64 elements)  # 8 obs, 240 bytes

julia> catobs(n[1], n[3])
ProductNode  # 2 obs, 24 bytes
  ├── t1: ArrayNode(2×2 Array with Float64 elements)  # 2 obs, 80 bytes
  ╰── t2: BagNode  # 2 obs, 96 bytes
            ╰── ArrayNode(3×6 Array with Float64 elements)  # 6 obs, 192 bytes

See also: Mill.subset.

source
Mill.subsetFunction
subset(n, i)

Extract a subset i of samples (observations) stored in node n.

Similar to Base.getindex or MLUtils.getobs but defined for all Mill.jl compatible data as well.

Examples

julia> Mill.subset(ArrayNode(NGramMatrix(["Hello", "world"])), 2)
2053×1 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
 "world"

julia> Mill.subset(BagNode(ArrayNode(randn(2, 8)), [1:2, 3:3, 4:7, 8:8]), 1:3)
BagNode  # 3 obs, 112 bytes
  ╰── ArrayNode(2×7 Array with Float64 elements)  # 7 obs, 160 bytes

See also: catobs.

source
Mill.mapdataFunction
mapdata(f, x)

Recursively apply f to data in all leaves of x.

Examples

julia> n1 = ProductNode(a=zeros(2,2), b=ones(2,2))
ProductNode  # 2 obs, 16 bytes
  ├── a: ArrayNode(2×2 Array with Float64 elements)  # 2 obs, 80 bytes
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  # 2 obs, 80 bytes

julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode  # 2 obs, 16 bytes
  ├── a: ArrayNode(2×2 Array with Float64 elements)  # 2 obs, 80 bytes
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  # 2 obs, 80 bytes

julia> Mill.data(n2).a
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 1.0  1.0
 1.0  1.0

julia> Mill.data(n2).b
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 2.0  2.0
 2.0  2.0
source
Mill.removeinstancesFunction
removeinstances(n::AbstractBagNode, mask)

Remove instances from n using mask and remap bag indices accordingly.

Examples

julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode  # 3 obs, 112 bytes
  ╰── ArrayNode(2×3 Array with Int64 elements)  # 3 obs, 96 bytes

julia> b2 = removeinstances(b1, [false, true, true])
BagNode  # 3 obs, 112 bytes
  ╰── ArrayNode(2×2 Array with Int64 elements)  # 2 obs, 80 bytes

julia> b2.data
2×2 ArrayNode{Matrix{Int64}, Nothing}:
 2  3
 5  6

julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])
source