HierarchicalUtils.jl

JsonGrinder.jl uses HierarchicalUtils.jl which brings a lot of additional features.

using HierarchicalUtils

Let's say we gave complex schema and we want to find type instabilities.

After creating the schema as

julia> using JSON, JsonGrinder
julia> j1 = JSON.parse("""{"a": 4, "b": "birb"}""")Dict{String, Any} with 2 entries:
  "b" => "birb"
  "a" => 4
julia> j2 = JSON.parse("""{"a": { "a": "hello", "b":[5,6]}, "b": "bird"}""")Dict{String, Any} with 2 entries:
  "b" => "bird"
  "a" => Dict{String, Any}("b"=>Any[5, 6], "a"=>"hello")
julia> j3 = JSON.parse("""{"a": [1, 2, 3, "hi"], "b": "word"}""")Dict{String, Any} with 2 entries:
  "b" => "word"
  "a" => Any[1, 2, 3, "hi"]
julia> sch = schema([j1, j2, j3])[ Info: In path [a]: Instability in the schema detected. Using multiple representation.
[ Info: In path [4]: Instability in the schema detected. Using multiple representation.
[Dict]  # updated = 3
  ├── a: [MultiEntry]  # updated = 3
  │        ├── 1: [Scalar - Int64], 1 unique values  # updated = 1
  │        ├── 2: [Dict]  # updated = 1
  │        │        ├── a: [Scalar - String], 1 unique values  # updated = 1
  │        │        ╰── b: [List]  # updated = 1
  │        │                 ╰── [Scalar - Int64], 2 unique values  # updated = 2
  │        ╰── 3: [List]  # updated = 1
  │                 ╰── [MultiEntry]  # updated = 4
  │                       ├── 1: [Scalar - Int64], 3 unique values  # updated = 3
  │                       ╰── 2: [Scalar - String], 1 unique values  # updated = 1
  ╰── b: [Scalar - String], 3 unique values  # updated = 3

In small enough schema, you can immediately see all types of nodes, but it gets more complicated if the schema does not fit your screen. Let's see how we can leverage HierarchicalUtils to programmatically examine shema.

This can be used to print a non-truncated version of a model:

julia> printtree(sch)[Dict]  # updated = 3
  ├── a: [MultiEntry]  # updated = 3
  │        ├── 1: [Scalar - Int64], 1 unique values  # updated = 1
  │        ├── 2: [Dict]  # updated = 1
  │        │        ├── a: [Scalar - String], 1 unique values  # updated = 1
  │        │        ╰── b: [List]  # updated = 1
  │        │                 ╰── [Scalar - Int64], 2 unique values  # updated = 2
  │        ╰── 3: [List]  # updated = 1
  │                 ╰── [MultiEntry]  # updated = 4
  │                       ├── 1: [Scalar - Int64], 3 unique values  # updated = 3
  │                       ╰── 2: [Scalar - String], 1 unique values  # updated = 1
  ╰── b: [Scalar - String], 3 unique values  # updated = 3

Callling with trav=true enables convenient traversal functionality with string indexing:

julia> printtree(sch, trav=true)[Dict] [""]  # updated = 3
  ├── a: [MultiEntry] ["E"]  # updated = 3
  │        ├── 1: [Scalar - Int64], 1 unique values ["I"]  # updated = 1
  │        ├── 2: [Dict] ["M"]  # updated = 1
  │        │        ├── a: [Scalar - String], 1 unique values ["N"]  # updated = 1
  │        │        ╰── b: [List] ["O"]  # updated = 1
  │        │                 ╰── [Scalar - Int64], 2 unique values ["OU"]  # updated = 2
  │        ╰── 3: [List] ["Q"]  # updated = 1
  │                 ╰── [MultiEntry] ["S"]  # updated = 4
  │                       ├── 1: [Scalar - Int64], 3 unique values ["SU"]  # updated = 3
  │                       ╰── 2: [Scalar - String], 1 unique values ["T*"]  # updated = 1
  ╰── b: [Scalar - String], 3 unique values ["U"]  # updated = 3

This way any element in the schema is swiftly accessible, which may come in handy when inspecting model parameters or simply deleting/replacing/inserting nodes to tree (for instance when constructing adversarial samples). All tree nodes are accessible by indexing with the traversal code:

julia> sch["N"][Scalar - String], 1 unique values  # updated = 1

The following two approaches give the same result:

julia> sch["N"] === sch.childs[:a][2][:a]true

We can even search for specific elements in schema. Let's examine occurrences of irregularities a.k.a. MultiEntry by running

julia> TypeIterator(JsonGrinder.MultiEntry, sch) |> collect2-element Vector{JsonGrinder.MultiEntry}:
 MultiEntry
 MultiEntry

which tells us there are 2 multientries, but does not tell us where they are.

Using this

julia> filter(e->sch[e] isa JsonGrinder.MultiEntry, list_traversal(sch))2-element Vector{String}:
 "E"
 "S"

we can see that sch["E"] and sch["S"] are indeed MultiEntry, but we don't have easy way to see where they are in schema.

julia> using Mill
julia> lenses = [only(code2lens(sch, e)) for e in list_traversal(sch) if sch[e] isa JsonGrinder.MultiEntry]2-element Vector{Setfield.ComposedLens{Setfield.PropertyLens{:childs}}}:
 (@lens _.childs[:a])
 (@lens _.childs[:a].childs[3].items)

gives us lenses to access them and also information about path from root. @lens is part of Setfield.jl package which allows creating lenses which let you easily describe and apply accessors for hierarchical structures.

julia> get(sch, lenses[1])[MultiEntry]  # updated = 3
  ├── 1: [Scalar - Int64], 1 unique values  # updated = 1
  ├── 2: [Dict]  # updated = 1
  │        ├── a: [Scalar - String], 1 unique values  # updated = 1
  │        ╰── b: [List]  # updated = 1
  │                 ╰── [Scalar - Int64], 2 unique values  # updated = 2
  ╰── 3: [List]  # updated = 1
           ╰── [MultiEntry]  # updated = 4
                 ├── 1: [Scalar - Int64], 3 unique values  # updated = 3
                 ╰── 2: [Scalar - String], 1 unique values  # updated = 1

returns the first MultiEntry and

julia> get(sch, lenses[2])[MultiEntry]  # updated = 4
  ├── 1: [Scalar - Int64], 3 unique values  # updated = 3
  ╰── 2: [Scalar - String], 1 unique values  # updated = 1

returns the second one.

For the complete showcase of possibilities, refer to HierarchicalUtils.jl and this notebook