Data nodes

Index

Mill.AbstractBagNode
Mill.AbstractMillNode
Mill.AbstractProductNode
Mill.ArrayNode
Mill.ArrayNode
Mill.BagNode
Mill.BagNode
Mill.LazyNode
Mill.LazyNode
Mill.ProductNode
Mill.ProductNode
Mill.WeightedBagNode
Mill.WeightedBagNode
Mill.catobs
Mill.data
Mill.datasummary
Mill.dropmeta
Mill.mapdata
Mill.metadata
Mill.metadata_getindex
Mill.removeinstances
Mill.unpack2mill

API

Mill.AbstractMillNode — Type

AbstractMillNode

Supertype for any structure representing a data node.

source

Mill.AbstractProductNode — Type

AbstractProductNode <: AbstractMillNode

Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.

source

Mill.AbstractBagNode — Type

AbstractBagNode <: AbstractMillNode

Supertype for any data node structure representing a multi-instance learning problem.

source

Mill.ArrayNode — Type

ArrayNode{A <: AbstractArray, C} <: AbstractMillNode

Data node for storing array-like data of type A and metadata of type C. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.

See also: AbstractMillNode, ArrayModel.

source

Mill.ArrayNode — Method

ArrayNode(d::AbstractArray, m=nothing)

Construct a new ArrayNode with data d and metadata m.

Examples

julia> a = ArrayNode([1 2; 3 4; 5 6])
3×2 ArrayNode{Matrix{Int64}, Nothing}:
 1  2
 3  4
 5  6

See also: AbstractMillNode, ArrayModel.

source

Mill.BagNode — Type

BagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, C} <: AbstractBagNode

Data node that represents a multi-instance learning problem.

Contains instances stored in a subtree of type T, bag indices of type B and optional metadata of type C.

source

Mill.BagNode — Method

BagNode(d, b, m=nothing)

Construct a new BagNode with data d, bags b, and metadata m.

d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.

If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode  2 obs
  ╰── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements)  3 obs

julia> BagNode(randn(2, 5), [1, 2, 2, 1, 1])
BagNode  2 obs
  ╰── ArrayNode(2×5 Array with Float64 elements)  5 obs

source

Mill.WeightedBagNode — Type

WeightedBagNode{T <: Union{AbstractMillNode, Missing}, B <: AbstractBags, W, C} <: AbstractBagNode

Structure like BagNode but allows to specify weights of type W of each instance.

source

Mill.WeightedBagNode — Method

WeightedBagNode(d, b, w::Vector, m=nothing)

Construct a new WeightedBagNode with data d, bags b, vector of weights w and metadata m.

d is either an AbstractMillNode or missing. Any other type is wrapped in an ArrayNode.

If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> WeightedBagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
WeightedBagNode  2 obs
  ╰── ArrayNode(2053×2 NGramMatrix with Int64 elements)  2 obs

julia> WeightedBagNode(zeros(2, 2), [1, 2], [1, 2])
WeightedBagNode  2 obs
  ╰── ArrayNode(2×2 Array with Float64 elements)  2 obs

source

Mill.ProductNode — Type

ProductNode{T, C} <: AbstractProductNode

Data node representing a Cartesian product of several spaces each represented by subtree stored in iterable of type T. May store metadata of type C.

source

Mill.ProductNode — Method

ProductNode(dss, m=nothing)
ProductNode(m=nothing; dss...)

Construct a new ProductNode with data dss, and metadata m.

dss should be a Tuple or NamedTuple and all its elements must contain the same number of observations.

If any element of dss is not an AbstractMillNode it is first wrapped in an ArrayNode.

Examples

julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode  2 obs
  ├── ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── ArrayNode(2×2 OneHotArray with Bool elements)  2 obs

julia> ProductNode(x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
                   x2 = BagNode(ArrayNode([1 2; 3 4]), [1:2, 0:-1]))
ProductNode  2 obs
  ├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements)  2 obs
  ╰── x2: BagNode  2 obs
            ╰── ArrayNode(2×2 Array with Int64 elements)  2 obs

julia> ProductNode([1 2 3])
ProductNode  3 obs
  ╰── ArrayNode(1×3 Array with Int64 elements)  3 obs

julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1 2 3; 4 5 6])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]

source

Mill.LazyNode — Type

LazyNode{Name, D, C} <: AbstractMillNode

Data node storing data of type D in a lazy manner and optional metadata of type C.

Source of data or its type is specified in Name.

source

Mill.LazyNode — Method

LazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)

Construct a new LazyNode with name Name, data d, and metadata m.

Examples

julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{:Codons, Vector{String}, Nothing}:
 "GGGCGGCGA"
 "CCTCGCGGG"

source

Mill.unpack2mill — Function

Mill.unpack2mill(x::LazyNode)

Return a representation of LazyNode x using Mill.jl structures. Every custom LazyNode should have a special method as it is used in LazyModel.

Examples

julia> function Mill.unpack2mill(ds::LazyNode{:Sentence})
    s = split.(ds.data, " ")
    x = NGramMatrix(reduce(vcat, s))
    BagNode(x, Mill.length2bags(length.(s)))
end;

julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode  2 obs
  ╰── ArrayNode(2053×3 NGramMatrix with Int64 elements)  3 obs

See also: modelsummary.

source

Mill.dropmeta — Function

dropmeta(n:AbstractMillNode)

Drop metadata stored in data node n (recursively).

Examples

julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Vector{String}}:
 "foo"
 "bar"

julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String, Vector{String}, Int64}, Nothing}:
 "foo"
 "bar"

julia> isnothing(Mill.metadata(n2))
true

source

Mill.catobs — Function

catobs(ns...)

Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.

Similar to Base.cat but concatenates along the abstract "axis" where samples are stored.

In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...]) to save compilation time.

Examples

julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Matrix{Float64}, Nothing}:
 0.0  0.0  1.0  2.0
 0.0  0.0  3.0  4.0

julia> n = ProductNode(t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8])))
ProductNode  3 obs
  ├── t1: ArrayNode(2×3 Array with Float64 elements)  3 obs
  ╰── t2: BagNode  3 obs
            ╰── ArrayNode(3×8 Array with Float64 elements)  8 obs

julia> catobs(n[1], n[3])
ProductNode  2 obs
  ├── t1: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── t2: BagNode  2 obs
            ╰── ArrayNode(3×6 Array with Float64 elements)  6 obs

source

Mill.metadata_getindex — Function

metadata_getindex(x, i::Integer)
metadata_getindex(x, i::VecOrRange{<:Integer})

Index into metadata x. In Mill.jl, it is assumed that the second or last dimension indexes into observations, whichever is smaller. This function can be used when implementing custom subtypes of AbstractMillNode.

Examples

julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2)
"bar"

julia> Mill.metadata_getindex(["foo", "bar", "baz"], 2:3)
2-element Vector{String}:
 "bar"
 "baz"

julia> Mill.metadata_getindex([1 2 3; 4 5 6], 2)
2-element Vector{Int64}:
 2
 5

julia> Mill.metadata_getindex([1 2 3; 4 5 6], [1, 3])
2×2 Matrix{Int64}:
 1  3
 4  6

See also: Mill.metadata, Mill.dropmeta.

source

Mill.mapdata — Function

mapdata(f, x)

Recursively apply f to data in all leaves of x.

Examples

julia> n1 = ProductNode(a=zeros(2,2), b=ones(2,2))
ProductNode  2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  2 obs

julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode  2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements)  2 obs
  ╰── b: ArrayNode(2×2 Array with Float64 elements)  2 obs

julia> Mill.data(n2).a
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 1.0  1.0
 1.0  1.0

julia> Mill.data(n2).b
2×2 ArrayNode{Matrix{Float64}, Nothing}:
 2.0  2.0
 2.0  2.0

source

Mill.removeinstances — Function

removeinstances(n::AbstractBagNode, mask)

Remove instances from n using mask and remap bag indices accordingly.

Examples

julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode  3 obs
  ╰── ArrayNode(2×3 Array with Int64 elements)  3 obs

julia> b2 = removeinstances(b1, [false, true, true])
BagNode  3 obs
  ╰── ArrayNode(2×2 Array with Int64 elements)  2 obs

julia> b2.data
2×2 ArrayNode{Matrix{Int64}, Nothing}:
 2  3
 5  6

julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])

source