It is recommended to read the Motivation section first to understand the crucial ideas behind hierarchical multiple instance learning.
Nodes
Mill.jl
enables representation of arbitrarily complex tree-like hierarchies and appropriate models for these hierarchies. It defines two core abstract types:
AbstractMillNode
which stores data on any level of abstraction and its subtypes can be further nestedAbstractMillModel
which helps to define a corresponding model. For each specific implementation ofAbstractMillNode
we have one or more specificAbstractMillModel
s for processing it.
Below we will introduce ArrayNode
, BagNode
and ProductNode
together with their corresponding models. It is possible to define data and model nodes for more complex behaviors (see Custom nodes), however, these three core types are already sufficient for most tasks. For instance, we can represent any JSON
document and use appropriate models to convert it to a vector represention or classify it (see Processing JSONs).
ArrayNode
and ArrayModel
ArrayNode
thinly wraps an array of features (specifically any subtype of AbstractArray
):
julia> X = Float32.([1 2 3 ; 4 5 6])
2×3 Matrix{Float32}: 1.0 2.0 3.0 4.0 5.0 6.0
julia> AN = ArrayNode(X)
2×3 ArrayNode{Matrix{Float32}, Nothing}: 1.0 2.0 3.0 4.0 5.0 6.0
Data carried by any AbstractMillNode
can be accessed with the Mill.data
function as follows:
julia> Mill.data(AN)
2×3 Matrix{Float32}: 1.0 2.0 3.0 4.0 5.0 6.0
Similarly, ArrayModel
wraps any function performing operation over this array. In example below, we wrap a feature matrix X
and a Dense
model from Flux.jl
:
using Flux: Dense
julia> f = Dense(2, 4)
Dense(2 => 4) # 12 parameters
julia> AM = ArrayModel(f)
ArrayModel(Dense(2 => 4)) 2 arrays, 12 params, 136 bytes
We can apply the model now with AM(AN)
to get another ArrayNode
and verify that the feedforward layer f
is really applied:
julia> AM(AN)
4×3 Matrix{Float32}: -0.750983 -1.09551 -1.44004 -0.0559078 0.113471 0.28285 -2.37767 -2.28613 -2.1946 0.272429 0.780232 1.28804
julia> f(X) == AM(AN)
true
A convenient property of all AbstractMillModel
nodes is that after applying them to a corresponding data node we always obtain an array as output regardless of the type and complexity of the model. This becomes important later.
The most common interpretation of the data inside ArrayNode
s is that each column contains features of one sample and therefore the node AN
carries size(Mill.data(AN), 2)
samples. In this sense, ArrayNode
s wrap the standard machine learning problem, where each sample is represented with a vector, a matrix or a more general tensor of features. Alternatively, one can obtain a number of samples of any AbstractMillNode
with numobs
function from MLUtils.jl
package, which Mill.jl
also exports:
julia> numobs(AN)
3
BagNode
BagNode
represents the standard multiple instance learning problem, that is, each sample is a bag containing an arbitrary number of instances. In the simplest case, each instance is a vector:
julia> BN = BagNode(AN, [1:2, 0:-1, 3:3])
BagNode 3 obs ╰── ArrayNode(2×3 Array with Float32 elements) 3 obs
where for simplicity we used AN
from the previous example. It is also possible to use data directly, in such case it is wrapped in an ArrayNode
automatically.
Each BagNode
carries data
and bags
fields:
julia> Mill.data(BN)
2×3 ArrayNode{Matrix{Float32}, Nothing}: 1.0 2.0 3.0 4.0 5.0 6.0
julia> BN.bags
AlignedBags{Int64}(UnitRange{Int64}[1:2, 0:-1, 3:3])
Here, data
can be an arbitrary AbstractMillNode
storing representation of instances (ArrayNode
in this case) and bags
field contains information, which instances belong to which bag. In this specific case bn
stores three bags (samples). The first one consists of a two instances {[1.0, 4.0], [2.0, 5.0]}
(first two columns of AN
), the second one is empty, and the thirs bag contains one instance {[3.0, 6.0]}
. We can see that we deal with two top-level samples (bags):
julia> numobs(BN)
3
whereas they are formed using three instances:
julia> numobs(AN)
3
In Mill.jl
, there are two ways to store indices of the bag's instances:
- in
AlignedBags
structure, which accepts aVector
ofUnitRange
s and requires all bag's instances stored continuously:
julia> AlignedBags([1:2, 3:3])
AlignedBags{Int64}(UnitRange{Int64}[1:2, 3:3])
- and in
ScatteredBags
structure, which accepts aVector
ofVectors
s storing not necessarily contiguous indices:
julia> ScatteredBags([[2, 1], [3]])
ScatteredBags{Int64}([[2, 1], [3]])
The two examples above are semantically equivalent, as bags are unordered collections of instances. An empty bag with no instances is in AlignedBags
specified as empty range 0:-1
and in ScatteredBags
as an empty vector Int[]
. The constructor of BagNode
accepts directly one of these two structures and tries to automagically decide the better type in other cases.
BagModel
Each BagNode
is processed by a BagModel
, which contains two (sub)models and an aggregation operator:
julia> im = ArrayModel(Dense(2, 5))
ArrayModel(Dense(2 => 5)) 2 arrays, 15 params, 148 bytes
julia> a = SegmentedMax(5)
SegmentedMax(ψ = Float32[0.0, 0.0, 0.0, 0.0, 0.0])
julia> bm = Dense(5, 3)
Dense(5 => 3) # 18 parameters
julia> BM = BagModel(im, a, bm)
BagModel ↦ SegmentedMax(5) ↦ Dense(5 => 3) 3 arrays, 23 params, 220 bytes ╰── ArrayModel(Dense(2 => 5)) 2 arrays, 15 params, 148 bytes
The first network submodel (called instance model im
) is responsible for converting the instance representation to a vector form:
julia> y = im(AN)
5×3 Matrix{Float32}: 0.062954 0.159188 0.255422 -0.733941 -1.19342 -1.65289 0.249682 -0.0169376 -0.283558 0.506802 0.875205 1.24361 2.02213 2.84063 3.65913
Note that because of the property mentioned above, the output of instance model im
will always be a Matrix
. We get four columns, one for each instance. This result is then used in SegmentedMax
operator a
which takes vector representation of all instances and produces a single vector per bag:
julia> y = a(y, BN.bags)
5×3 Matrix{Float32}: 0.159188 0.0 0.255422 -0.733941 0.0 -1.65289 0.249682 0.0 -0.283558 0.875205 0.0 1.24361 2.84063 0.0 3.65913
To read more about aggregation operators, see the Bag aggregation section. For an explanation how empty bags are aggregated, see Missing data.
Finally, y
is then passed to a feed forward model (called bag model bm
) producing the final output per bag. In our example we therefore get a matrix with three columns:
julia> y = bm(y)
3×3 Matrix{Float32}: 0.655082 0.0 0.179365 -2.47276 0.0 -3.63168 -0.834547 0.0 -1.11086
However, the best way to use a bag model node is to simply apply it, which results into the same output:
julia> BM(BN) == y
true
The whole procedure is depicted in the following picture:
Three instances of the BagNode
are represented by red subtrees are first mapped with instance model im
, aggregated (aggregation operator here is a concatenation of two different operators $a_1$ and $a_2$), and the results of aggregation are transformed with bag model bm
.
ProductNode
s and ProductModel
s
ProductNode
can be thought of as a Cartesian Product or a Dictionary
. It holds a Tuple
or NamedTuple
of nodes (not necessarily of the same type). For example, a ProductNode
with the BagNode
and the ArrayNode
from above and two more nodes as children would look like this:
julia> PN = tuple( ArrayNode(randn(Float32, 2, 3)), BagNode(ArrayNode(zeros(Float32, 4, 4)), [1:1, 2:2, 3:4]), BN, AN ) |> ProductNode
ProductNode 3 obs ├── ArrayNode(2×3 Array with Float32 elements) 3 obs ├── BagNode 3 obs │ ╰── ArrayNode(4×4 Array with Float32 elements) 4 obs ├── BagNode 3 obs │ ╰── ArrayNode(2×3 Array with Float32 elements) 3 obs ╰── ArrayNode(2×3 Array with Float32 elements) 3 obs
Analogically, the ProductModel
contains a Tuple
or NamedTuple
of (sub)models processing each of its children (stored in ms
field standing for models), as well as one more (sub)model m
:
julia> ms = tuple( ArrayModel(Dense(2, 2)), BagModel(ArrayModel(Dense(4, 6)), SegmentedMean(6), Dense(6, 5)), BM, AM);
julia> m = Dense(14, 9);
julia> PM = ProductModel(ms, m)
ProductModel ↦ Dense(14 => 9) 2 arrays, 135 params, 628 bytes ├── ArrayModel(Dense(2 => 2)) 2 arrays, 6 params, 112 bytes ├── BagModel ↦ SegmentedMean(6) ↦ Dense(6 => 5) 3 arrays, 41 params, 292 by ⋯ │ ╰── ArrayModel(Dense(4 => 6)) 2 arrays, 30 params, 208 bytes ├── BagModel ↦ SegmentedMax(5) ↦ Dense(5 => 3) 3 arrays, 23 params, 220 byt ⋯ │ ╰── ArrayModel(Dense(2 => 5)) 2 arrays, 15 params, 148 bytes ╰── ArrayModel(Dense(2 => 4)) 2 arrays, 12 params, 136 bytes
Again, since the library is based on the property that the output of each model is an array, the product model applies models from ms
to appropriate children and vertically concatenates the output, which is then processed by model m
. An example of model processing the above sample would be:
julia> y = PM.m(vcat([PM.ms[i](PN.data[i]) for i in 1:4]...))
9×3 Matrix{Float32}: 1.30292 1.33766 0.462887 -2.32897 -1.89836 -2.76772 -0.893491 0.228725 -1.17165 0.179267 0.934965 -0.815158 -0.93786 -0.124089 -1.8643 -1.03552 -0.567542 -1.90516 -0.201476 -0.602289 0.793139 -0.501704 0.0751399 -1.98769 -0.90056 -0.648128 -2.29961
which is equivalent to:
julia> PM(PN) == y
true
Application of this product model can be schematically visualized as follows: