More on nodes
Node nesting
The main advantage of the Mill.jl
library is that it allows to arbitrarily nest and cross-product BagModel
s, as described in Theorem 5 in [6]. In other words, instances themselves may be represented in much more complex way than in the BagNode
and BagModel
example.
Let's start the demonstration by nesting two MIL problems. The outer MIL model contains three samples (outer-level bags), whose instances are (inner-level) bags themselves. The first outer-level bag contains one inner-level bag problem with two inner-level instances, the second outer-level bag contains two inner-level bags with total of three inner-level instances, and finally the third outer-level bag contains two inner bags with four instances:
julia> ds = BagNode(BagNode(ArrayNode(randn(Float32, 4, 10)), [1:2, 3:4, 5:5, 6:7, 8:10]), [1:1, 2:3, 4:5])
BagNode 3 obs ╰── BagNode 5 obs ╰── ArrayNode(4×10 Array with Float32 elements) 10 obs
Here is one example of a model, which is appropriate for this hierarchy:
using Flux: Dense, Chain, relu
julia> m = BagModel( BagModel( ArrayModel(Dense(4, 3, relu)), SegmentedMeanMax(3), Dense(6, 3, relu)), SegmentedMeanMax(3), Chain(Dense(6, 3, relu), Dense(3, 2)))
BagModel ↦ [SegmentedMean(3); SegmentedMax(3)] ↦ Chain(Dense(6 => 3, relu), De ⋯ ╰── BagModel ↦ [SegmentedMean(3); SegmentedMax(3)] ↦ Dense(6 => 3, relu) 4 ⋯ ╰── ArrayModel(Dense(4 => 3, relu)) 2 arrays, 15 params, 148 bytes
and can be directly applied to obtain a result:
julia> m(ds)
2×3 Matrix{Float32}: 0.0 -0.0165677 -0.0404657 0.0 0.013503 0.0329802
Here we again make use of the property that even if each instance is represented with an arbitrarily complex structure, we always obtain a vector representation after applying instance model im
, regardless of the complexity of im
and Mill.data(ds)
:
julia> m.im(Mill.data(ds))
3×5 Matrix{Float32}: 0.0 0.545624 0.0 1.03625 0.018455 0.0 0.708773 0.0 1.16774 0.0 0.403142 0.740321 0.0 1.4657 0.556679
In one final example we demonstrate a complex model consisting of all types of nodes introduced so far:
julia> ds = BagNode(ProductNode((BagNode(randn(Float32, 4, 10), [1:2, 3:4, 5:5, 6:7, 8:10]), randn(Float32, 3, 5), BagNode(BagNode(randn(Float32, 2, 30), [i:i+1 for i in 1:2:30]), [1:3, 4:6, 7:9, 10:12, 13:15]), randn(Float32, 2, 5))), [1:1, 2:3, 4:5])
BagNode 3 obs ╰── ProductNode 5 obs ├── BagNode 5 obs │ ╰── ArrayNode(4×10 Array with Float32 elements) 10 obs ├── ArrayNode(3×5 Array with Float32 elements) 5 obs ├── BagNode 5 obs │ ╰── BagNode 15 obs │ ╰── ArrayNode(2×30 Array with Float32 elements) 30 obs ╰── ArrayNode(2×5 Array with Float32 elements) 5 obs
When data and model trees become complex, Mill.jl
limits the printing. To inspect the whole tree, use printtree
:
julia> printtree(ds)
BagNode 3 obs ╰── ProductNode 5 obs ├── BagNode 5 obs │ ╰── ArrayNode(4×10 Array with Float32 elements) 10 obs ├── ArrayNode(3×5 Array with Float32 elements) 5 obs ├── BagNode 5 obs │ ╰── BagNode 15 obs │ ╰── ArrayNode(2×30 Array with Float32 elements) 30 obs ╰── ArrayNode(2×5 Array with Float32 elements) 5 obs
Instead of defining a model manually, we can also make use of Model reflection, another Mill.jl
functionality, which simplifies model creation:
julia> m = reflectinmodel(ds, d -> Dense(d, 2), SegmentedMean)
BagModel ↦ SegmentedMean(2) ↦ Dense(2 => 2) 3 arrays, 8 params, 160 bytes ╰── ProductModel ↦ Dense(8 => 2) 2 arrays, 18 params, 160 bytes ├── BagModel ↦ SegmentedMean(2) ↦ Dense(2 => 2) 3 arrays, 8 params, 1 ⋯ │ ╰── ArrayModel(Dense(4 => 2)) 2 arrays, 10 params, 128 bytes ├── ArrayModel(Dense(3 => 2)) 2 arrays, 8 params, 120 bytes ├── BagModel ↦ SegmentedMean(2) ↦ Dense(2 => 2) 3 arrays, 8 params, 1 ⋯ │ ╰── BagModel ↦ SegmentedMean(2) ↦ Dense(2 => 2) 3 arrays, 8 par ⋯ │ ╰── ArrayModel(Dense(2 => 2)) 2 arrays, 6 params, 112 byt ⋯ ╰── ArrayModel(Dense(2 => 2)) 2 arrays, 6 params, 112 bytes
julia> m(ds)
2×3 Matrix{Float32}: -0.0760647 0.0462476 -0.0648969 0.030561 0.0626038 -0.0322707
Node conveniences
To make the handling of data and model hierarchies easier, Mill.jl
provides several tools. Let's setup some data:
julia> AN = ArrayNode(Float32.([1 2 3 4; 5 6 7 8]))
2×4 ArrayNode{Matrix{Float32}, Nothing}: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
julia> AM = reflectinmodel(AN)
ArrayModel(Dense(2 => 10)) 2 arrays, 30 params, 208 bytes
julia> BN = BagNode(AN, [1:1, 2:3, 4:4])
BagNode 3 obs ╰── ArrayNode(2×4 Array with Float32 elements) 4 obs
julia> BM = reflectinmodel(BN)
BagModel ↦ BagCount([SegmentedMean(10); SegmentedMax(10)]) ↦ Dense(21 => 10) ⋯ ╰── ArrayModel(Dense(2 => 10)) 2 arrays, 30 params, 208 bytes
julia> PN = ProductNode(a=Float32.([1 2 3; 4 5 6]), b=BN)
ProductNode 3 obs ├── a: ArrayNode(2×3 Array with Float32 elements) 3 obs ╰── b: BagNode 3 obs ╰── ArrayNode(2×4 Array with Float32 elements) 4 obs
julia> PM = reflectinmodel(PN)
ProductModel ↦ Dense(20 => 10) 2 arrays, 210 params, 928 bytes ├── a: ArrayModel(Dense(2 => 10)) 2 arrays, 30 params, 208 bytes ╰── b: BagModel ↦ BagCount([SegmentedMean(10); SegmentedMax(10)]) ↦ Dense(21 ⋯ ╰── ArrayModel(Dense(2 => 10)) 2 arrays, 30 params, 208 bytes
Function: numobs
numobs
function from MLUtils.jl
returns a number of samples from the current level point of view. This number usually increases as we go down the tree when BagNode
s are involved, as each bag may contain more than one instance.
julia> numobs(AN)
4
julia> numobs(BN)
3
julia> numobs(PN)
3
Indexing and Slicing
Indexing in Mill.jl
operates on the level of observations:
julia> AN[1]
2×1 ArrayNode{Matrix{Float32}, Nothing}: 1.0 5.0
julia> numobs(ans)
1
julia> BN[2]
BagNode 1 obs ╰── ArrayNode(2×2 Array with Float32 elements) 2 obs
julia> numobs(ans)
1
julia> PN[3]
ProductNode 1 obs ├── a: ArrayNode(2×1 Array with Float32 elements) 1 obs ╰── b: BagNode 1 obs ╰── ArrayNode(2×1 Array with Float32 elements) 1 obs
julia> numobs(ans)
1
julia> AN[[1, 4]]
2×2 ArrayNode{Matrix{Float32}, Nothing}: 1.0 4.0 5.0 8.0
julia> numobs(ans)
2
julia> BN[1:2]
BagNode 2 obs ╰── ArrayNode(2×3 Array with Float32 elements) 3 obs
julia> numobs(ans)
2
julia> PN[[2, 3]]
ProductNode 2 obs ├── a: ArrayNode(2×2 Array with Float32 elements) 2 obs ╰── b: BagNode 2 obs ╰── ArrayNode(2×3 Array with Float32 elements) 3 obs
julia> numobs(ans)
2
julia> PN[Int[]]
ProductNode 0 obs ├── a: ArrayNode(2×0 Array with Float32 elements) 0 obs ╰── b: BagNode 0 obs ╰── ArrayNode(2×0 Array with Float32 elements) 0 obs
julia> numobs(ans)
0
This may be useful for creating minibatches and their permutations.
Note that apart from the perhaps apparent recurrent effect, this operation requires other implicit actions, such as properly recomputing bag indices:
julia> BN.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 2:3, 4:4])
julia> BN[[1, 3]].bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 2:2])
Function: catobs
catobs
function concatenates several samples (datasets) together:
julia> catobs(AN[1], AN[4])
2×2 ArrayNode{Matrix{Float32}, Nothing}: 1.0 4.0 5.0 8.0
julia> catobs(BN[3], BN[[2, 1]])
BagNode 3 obs ╰── ArrayNode(2×4 Array with Float32 elements) 4 obs
julia> catobs(PN[[1, 2]], PN[3:3]) == PN
true
Again, the effect is recurrent and everything is appropriately recomputed:
julia> BN.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 2:3, 4:4])
julia> catobs(BN[3], BN[[1]]).bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 2:2])
This operation is an analogy to what is usually done in the classical setting. If every observation is represented as a vector of features, each (mini)batch of samples is first concatenated into one matrix and the whole matrix is run through the neural network using fast matrix multiplication procedures. The same reasoning applies here, but instead of Base.cat
, catobs
is needed.
Equipped with everything mentioned above there are two different ways to construct minibatches from data. First option, applicable mainly to smaller datasets, is to load all avaiable data into memory, store it as one big data node containing all observations, and use Indexing and Slicing to obtain minibatches. Such approach is demonstrated in the Musk example. The other option is to read all observations into memory separately (or load them on demand) and construct minibatches with catobs
.
For more tips for handling datasets and models, see External tools.
Metadata
Each AbstractMillNode
can also carry arbitrary metadata (defaulting to nothing
). Metadata is provided upon construction of the node and accessed metadata by Mill.metadata
:
julia> n1 = ArrayNode(randn(2, 2), ["metadata"])
2×2 ArrayNode{Matrix{Float64}, Vector{String}}: -0.22684270643341806 1.3321657770216255 0.015328464950175435 1.059974135058343
julia> Mill.metadata(n1)
1-element Vector{String}: "metadata"
julia> n2 = ProductNode(n1, [1 3; 2 4])
ProductNode 2 obs ╰── ArrayNode(2×2 Array with Float64 elements) 2 obs
julia> Mill.metadata(n2)
2×2 Matrix{Int64}: 1 3 2 4