Integrating with Hyperopt.jl
Below, we show a simple example of how to use Hyperopt.jl
to perform hyperparameter optimization for us.
We reuse a lot of code from the Recipe Ingredients example and recommend the reader to get familiar with it.
The data are accessible here.
We import all libraries, split the dataset into training, testing, and validation sets, prepare one-hot-encoded labels, infer a Schema
, define an Extractor
, and extract all documents:
using Mill, Flux, OneHotArrays, JSON, Statistics
using Random; Random.seed!(42);
dataset = JSON.parse.(readlines("../examples/recipes/recipes.jsonl"))
shuffle!(dataset)
jss_train, jss_val, jss_test = dataset[1:1500], dataset[1501:2000], dataset[2001:end]
y_train = getindex.(jss_train, "cuisine")
y_val = getindex.(jss_val, "cuisine")
y_test = getindex.(jss_test, "cuisine")
classes = unique(y_train)
y_train_oh = onehotbatch(y_train, classes)
sch = schema(jss_train)
delete!(sch.children, :cuisine)
delete!(sch.children, :id)
e = suggestextractor(sch)
x_train = extract(e, jss_train)
x_val = extract(e, jss_val)
x_test = extract(e, jss_test)
pred(m, x) = softmax(m(x))
accuracy(p, y) = mean(onecold(p, classes) .== y)
loss(m, x, y) = Flux.Losses.logitbinarycrossentropy(m(x), y)
Now we define a function train_model
, which given a set of hyperparameters trains a new model. We will use the following hyperparameters:
epochs
: number of epochs in trainingbatchsize
: number of samples in a single minibatchd
: "inner" dimensionality ofDense
layers in the modelslayers
: number of layers to use in each node in the modelactivation
: activation function
function train_model(epochs, batchsize, d, layers, activation)
layer_builder = in_d -> Chain(
Dense(in_d, d, activation), [Dense(d, d, activation) for _ in 1:layers-1]...
)
encoder = reflectinmodel(sch, e, layer_builder)
model = Dense(d, length(classes)) ∘ encoder
opt_state = Flux.setup(Flux.Optimise.Adam(), model);
minibatch_iterator = Flux.DataLoader((x_train, y_train_oh); batchsize, shuffle=true);
for i in 1:epochs
Flux.train!(loss, model, minibatch_iterator, opt_state)
end
model
end
julia> train_model(2, 20, 10, 2, identity)
Dense(10 => 20) ∘ ProductModel ↦ identity
Now we can run the hyperparameter search. In this simple example, we will use RandomSampler
, and in each iteration we will train a new model with train_model
. The optimization criterion is the accuracy on the validation set.
using Hyperopt
julia> ho = @hyperopt for i = 50, sampler = RandomSampler(), epochs = [3, 5, 10], batchsize = [16, 32, 64], d = [16, 32, 64], layers = [1, 2, 3], activation = [identity, relu, tanh] model = train_model(epochs, batchsize, d, layers, activation) accuracy(pred(model, x_val), y_val) end
Hyperoptimizing 4%|█▌ | ETA: 0:01:39 Hyperoptimizing 6%|██▏ | ETA: 0:02:14 Hyperoptimizing 8%|██▉ | ETA: 0:01:43 Hyperoptimizing 10%|███▋ | ETA: 0:01:26 Hyperoptimizing 12%|████▍ | ETA: 0:01:16 Hyperoptimizing 14%|█████ | ETA: 0:01:25 Hyperoptimizing 16%|█████▊ | ETA: 0:01:14 Hyperoptimizing 18%|██████▌ | ETA: 0:01:05 Hyperoptimizing 20%|███████▎ | ETA: 0:00:59 Hyperoptimizing 22%|███████▉ | ETA: 0:01:11 Hyperoptimizing 24%|████████▋ | ETA: 0:01:12 Hyperoptimizing 26%|█████████▍ | ETA: 0:01:05 Hyperoptimizing 28%|██████████▏ | ETA: 0:01:00 Hyperoptimizing 30%|██████████▊ | ETA: 0:00:55 Hyperoptimizing 32%|███████████▌ | ETA: 0:00:51 Hyperoptimizing 34%|████████████▎ | ETA: 0:00:47 Hyperoptimizing 36%|█████████████ | ETA: 0:00:44 Hyperoptimizing 38%|█████████████▋ | ETA: 0:00:46 Hyperoptimizing 40%|██████████████▍ | ETA: 0:00:43 Hyperoptimizing 42%|███████████████▏ | ETA: 0:00:44 Hyperoptimizing 44%|███████████████▉ | ETA: 0:00:41 Hyperoptimizing 46%|████████████████▌ | ETA: 0:00:38 Hyperoptimizing 48%|█████████████████▎ | ETA: 0:00:38 Hyperoptimizing 50%|██████████████████ | ETA: 0:00:36 Hyperoptimizing 52%|██████████████████▊ | ETA: 0:00:34 Hyperoptimizing 54%|███████████████████▌ | ETA: 0:00:31 Hyperoptimizing 56%|████████████████████▏ | ETA: 0:00:29 Hyperoptimizing 58%|████████████████████▉ | ETA: 0:00:27 Hyperoptimizing 60%|█████████████████████▋ | ETA: 0:00:25 Hyperoptimizing 62%|██████████████████████▍ | ETA: 0:00:24 Hyperoptimizing 64%|███████████████████████ | ETA: 0:00:22 Hyperoptimizing 66%|███████████████████████▊ | ETA: 0:00:20 Hyperoptimizing 68%|████████████████████████▌ | ETA: 0:00:19 Hyperoptimizing 70%|█████████████████████████▎ | ETA: 0:00:17 Hyperoptimizing 72%|█████████████████████████▉ | ETA: 0:00:16 Hyperoptimizing 74%|██████████████████████████▋ | ETA: 0:00:14 Hyperoptimizing 76%|███████████████████████████▍ | ETA: 0:00:13 Hyperoptimizing 78%|████████████████████████████▏ | ETA: 0:00:12 Hyperoptimizing 80%|████████████████████████████▊ | ETA: 0:00:11 Hyperoptimizing 82%|█████████████████████████████▌ | ETA: 0:00:09 Hyperoptimizing 84%|██████████████████████████████▎ | ETA: 0:00:08 Hyperoptimizing 86%|███████████████████████████████ | ETA: 0:00:07 Hyperoptimizing 88%|███████████████████████████████▋ | ETA: 0:00:06 Hyperoptimizing 90%|████████████████████████████████▍ | ETA: 0:00:05 Hyperoptimizing 92%|█████████████████████████████████▏ | ETA: 0:00:04 Hyperoptimizing 94%|█████████████████████████████████▉ | ETA: 0:00:03 Hyperoptimizing 96%|██████████████████████████████████▌ | ETA: 0:00:02 Hyperoptimizing 98%|███████████████████████████████████▎| ETA: 0:00:01 Hyperoptimizing 100%|████████████████████████████████████| Time: 0:00:47 Hyperoptimizer with 1 length: [3, 5, 10] 2 length: [16, 32, 64] 3 length: [16, 32, 64] 4 length: [1, 2, 3] 5 length: Function[identity, NNlib.relu, tanh] minimum / maximum: (0.148, 0.63) minimizer: epochs batchsize d layers activation 10 64 16 2 tanh
We have arrived at the following solution:
julia> printmax(ho)
epochs = 10 batchsize = 64 d = 64 layers = 3 activation = identity
Finally, we test the solution on the testing data:
julia> final_model = train_model(ho.maximizer...);
julia> accuracy(pred(final_model, x_test), y_test)
0.628
This concludes a very simple example of how to integrate JsonGrinder.jl
with Hyperopt.jl
. Note that we could and should go further and experiment not only with the hyperparameters presented here, but also with the definition of the schema and/or the extractor, which can also have significant impact on the results.