Integrating with Hyperopt.jl

Below, we show a simple example of how to use Hyperopt.jl to perform hyperparameter optimization for us.

Prerequisites

We reuse a lot of code from the Recipe Ingredients example and recommend the reader to get familiar with it.

The data are accessible here.

We import all libraries, split the dataset into training, testing, and validation sets, prepare one-hot-encoded labels, infer a Schema, define an Extractor, and extract all documents:

using Mill, Flux, OneHotArrays, JSON, Statistics

using Random; Random.seed!(42);
dataset = JSON.parse.(readlines("../examples/recipes/recipes.jsonl"))
shuffle!(dataset)
jss_train, jss_val, jss_test = dataset[1:1500], dataset[1501:2000], dataset[2001:end]

y_train = getindex.(jss_train, "cuisine")
y_val = getindex.(jss_val, "cuisine")
y_test = getindex.(jss_test, "cuisine")
classes = unique(y_train)
y_train_oh = onehotbatch(y_train, classes)

sch = schema(jss_train)
delete!(sch.children, :cuisine)
delete!(sch.children, :id)

e = suggestextractor(sch)

x_train = extract(e, jss_train)
x_val = extract(e, jss_val)
x_test = extract(e, jss_test)

pred(m, x) = softmax(m(x))
accuracy(p, y) = mean(onecold(p, classes) .== y)
loss(m, x, y) = Flux.Losses.logitbinarycrossentropy(m(x), y)

Now we define a function train_model, which given a set of hyperparameters trains a new model. We will use the following hyperparameters:

  • epochs: number of epochs in training
  • batchsize: number of samples in a single minibatch
  • d: "inner" dimensionality of Dense layers in the models
  • layers: number of layers to use in each node in the model
  • activation: activation function
function train_model(epochs, batchsize, d, layers, activation)
    layer_builder = in_d -> Chain(
        Dense(in_d, d, activation), [Dense(d, d, activation) for _ in 1:layers-1]...
    )
    encoder = reflectinmodel(sch, e, layer_builder)
    model = Dense(d, length(classes)) ∘ encoder

    opt_state = Flux.setup(Flux.Optimise.Adam(), model);
    minibatch_iterator = Flux.DataLoader((x_train, y_train_oh); batchsize, shuffle=true);

    for i in 1:epochs
        Flux.train!(loss, model, minibatch_iterator, opt_state)
    end

    model
end
julia> train_model(2, 20, 10, 2, identity)Dense(10 => 20) ∘ ProductModel ↦ identity

Now we can run the hyperparameter search. In this simple example, we will use RandomSampler, and in each iteration we will train a new model with train_model. The optimization criterion is the accuracy on the validation set.

using Hyperopt
julia> ho = @hyperopt for i = 50,
                   sampler = RandomSampler(),
                   epochs = [3, 5, 10],
                   batchsize = [16, 32, 64],
                   d = [16, 32, 64],
                   layers = [1, 2, 3],
                   activation = [identity, relu, tanh]
           model = train_model(epochs, batchsize, d, layers, activation)
           accuracy(pred(model, x_val), y_val)
       end
Hyperoptimizing   4%|█▌                                  |  ETA: 0:01:39
Hyperoptimizing   6%|██▏                                 |  ETA: 0:02:14
Hyperoptimizing   8%|██▉                                 |  ETA: 0:01:43
Hyperoptimizing  10%|███▋                                |  ETA: 0:01:26
Hyperoptimizing  12%|████▍                               |  ETA: 0:01:16
Hyperoptimizing  14%|█████                               |  ETA: 0:01:25
Hyperoptimizing  16%|█████▊                              |  ETA: 0:01:14
Hyperoptimizing  18%|██████▌                             |  ETA: 0:01:05
Hyperoptimizing  20%|███████▎                            |  ETA: 0:00:59
Hyperoptimizing  22%|███████▉                            |  ETA: 0:01:11
Hyperoptimizing  24%|████████▋                           |  ETA: 0:01:12
Hyperoptimizing  26%|█████████▍                          |  ETA: 0:01:05
Hyperoptimizing  28%|██████████▏                         |  ETA: 0:01:00
Hyperoptimizing  30%|██████████▊                         |  ETA: 0:00:55
Hyperoptimizing  32%|███████████▌                        |  ETA: 0:00:51
Hyperoptimizing  34%|████████████▎                       |  ETA: 0:00:47
Hyperoptimizing  36%|█████████████                       |  ETA: 0:00:44
Hyperoptimizing  38%|█████████████▋                      |  ETA: 0:00:46
Hyperoptimizing  40%|██████████████▍                     |  ETA: 0:00:43
Hyperoptimizing  42%|███████████████▏                    |  ETA: 0:00:44
Hyperoptimizing  44%|███████████████▉                    |  ETA: 0:00:41
Hyperoptimizing  46%|████████████████▌                   |  ETA: 0:00:38
Hyperoptimizing  48%|█████████████████▎                  |  ETA: 0:00:38
Hyperoptimizing  50%|██████████████████                  |  ETA: 0:00:36
Hyperoptimizing  52%|██████████████████▊                 |  ETA: 0:00:34
Hyperoptimizing  54%|███████████████████▌                |  ETA: 0:00:31
Hyperoptimizing  56%|████████████████████▏               |  ETA: 0:00:29
Hyperoptimizing  58%|████████████████████▉               |  ETA: 0:00:27
Hyperoptimizing  60%|█████████████████████▋              |  ETA: 0:00:25
Hyperoptimizing  62%|██████████████████████▍             |  ETA: 0:00:24
Hyperoptimizing  64%|███████████████████████             |  ETA: 0:00:22
Hyperoptimizing  66%|███████████████████████▊            |  ETA: 0:00:20
Hyperoptimizing  68%|████████████████████████▌           |  ETA: 0:00:19
Hyperoptimizing  70%|█████████████████████████▎          |  ETA: 0:00:17
Hyperoptimizing  72%|█████████████████████████▉          |  ETA: 0:00:16
Hyperoptimizing  74%|██████████████████████████▋         |  ETA: 0:00:14
Hyperoptimizing  76%|███████████████████████████▍        |  ETA: 0:00:13
Hyperoptimizing  78%|████████████████████████████▏       |  ETA: 0:00:12
Hyperoptimizing  80%|████████████████████████████▊       |  ETA: 0:00:11
Hyperoptimizing  82%|█████████████████████████████▌      |  ETA: 0:00:09
Hyperoptimizing  84%|██████████████████████████████▎     |  ETA: 0:00:08
Hyperoptimizing  86%|███████████████████████████████     |  ETA: 0:00:07
Hyperoptimizing  88%|███████████████████████████████▋    |  ETA: 0:00:06
Hyperoptimizing  90%|████████████████████████████████▍   |  ETA: 0:00:05
Hyperoptimizing  92%|█████████████████████████████████▏  |  ETA: 0:00:04
Hyperoptimizing  94%|█████████████████████████████████▉  |  ETA: 0:00:03
Hyperoptimizing  96%|██████████████████████████████████▌ |  ETA: 0:00:02
Hyperoptimizing  98%|███████████████████████████████████▎|  ETA: 0:00:01
Hyperoptimizing 100%|████████████████████████████████████| Time: 0:00:47
Hyperoptimizer with
  1 length: [3, 5, 10]
  2 length: [16, 32, 64]
  3 length: [16, 32, 64]
  4 length: [1, 2, 3]
  5 length: Function[identity, NNlib.relu, tanh]
  minimum / maximum: (0.148, 0.63)
  minimizer:
   epochs batchsize         d    layers activation
       10        64        16         2      tanh

We have arrived at the following solution:

julia> printmax(ho)epochs = 10
batchsize = 64
d = 64
layers = 3
activation = identity

Finally, we test the solution on the testing data:

julia> final_model = train_model(ho.maximizer...);
julia> accuracy(pred(final_model, x_test), y_test)0.628

This concludes a very simple example of how to integrate JsonGrinder.jl with Hyperopt.jl. Note that we could and should go further and experiment not only with the hyperparameters presented here, but also with the definition of the schema and/or the extractor, which can also have significant impact on the results.