Schema API Reference
Section of internal API reference related to creation, manipulation and visualization of the schema.
Index
JsonGrinder.ArrayEntry
JsonGrinder.DictEntry
JsonGrinder.Entry
JsonGrinder.MultiEntry
Base.delete!
Base.merge
JsonGrinder.newentry
JsonGrinder.prune_json
JsonGrinder.update!
JsonGrinder.updatemaxkeys!
JsonGrinder.updatemaxlen!
Internal functions
Base.delete!
— FunctionDeletes field
at the specified path
from the schema sch
. For instance, the following: delete!(schema, ".field.subfield.[]", "x")
deletes the field x
from schema
at: schema.childs[:field].childs[:subfield].items.childs
Base.merge
— FunctionDispatch of Base.merge
on JsonGrinder.JSONEntry
structures. Allows to merge multiple schemas to single one.
merge(es::Entry...)
merge(es::DictEntry...)
merge(es::ArrayEntry...)
merge(es::MultiEntry...)
merge(es::JsonGrinder.JSONEntry...)
it can be used to distribute calculation of schema across multiple workers to merge their partial results into bigger one.
Example
If we want to calculate schema from e.g. array of jsons in a distributed manner, if we have jsons
array and , we can do it using
using ThreadsX
ThreadsX.mapreduce(schema, merge, Iterators.partition(jsons, length(jsons) ÷ Threads.nthreads()))
or
using ThreadTools
merge(tmap(schema, Threads.nthreads(), Iterators.partition(jsons, length(jsons) ÷ Threads.nthreads()))
or, if you like to split it into multiple jobs and having them processed by multiple threads, it can look like
using ThreadTools
merge(tmap(schema, Threads.nthreads(), Iterators.partition(jsons, 1_000))
where we split array to smaller array of size 1k and let all available threads create partial schemas.
If your data is too large to fit into ram, following approach works well also with filenames and similar other ways to process large data.
JsonGrinder.prune_json
— Functionprune_json(json, schema)
Removes keys from json
which are not part of the schema
.
Example
julia> using JSON
julia> j1 = JSON.parse("{\"a\": 4, \"b\": {\"a\":1, \"b\": 1}}");
julia> j2 = JSON.parse("{\"a\": 4, \"b\": {\"a\":1}}");
julia> sch = JsonGrinder.schema([j1,j2])
[Dict] # updated = 2
├── a: [Scalar - Int64], 1 unique values # updated = 2
╰── b: [Dict] # updated = 2
├── a: [Scalar - Int64], 1 unique values # updated = 2
╰── b: [Scalar - Int64], 1 unique values # updated = 1
julia> j3 = Dict("a" => 4, "b" => Dict("a"=>1), "c" => 1, "d" => 2)
Dict{String, Any} with 4 entries:
"c" => 1
"b" => Dict("a"=>1)
"a" => 4
"d" => 2
julia> JsonGrinder.prune_json(j3, sch)
Dict{String, Any} with 2 entries:
"b" => Dict("a"=>1)
"a" => 4
so the JsonGrinder.prune_json
removes keys c
and d
.
JsonGrinder.newentry
— Functionnewentry(v)
creates new entry describing json according to the type of v
JsonGrinder.update!
— Functionfunction update!(a::Entry, v)
updates the entry when seeing value v
JsonGrinder.updatemaxkeys!
— Functionupdatemaxkeys!(n::Int)
limits the maximum number of keys in statistics of nodes in JSON. Default value is 10_000.
JsonGrinder.updatemaxlen!
— Functionupdatemaxlen!(n::Int)
limits the maximum size of string values in statistics of nodes in JSON. Default value is 10_000.
Longer strings will be trimmed and their length and hash will be appended to retain the uniqueness.
This is due to some strings being very long and causing the schema to be even order of magnitute larger than needed.
JsonGrinder.Entry
— Typemutable struct Entry <: JSONEntry
counts::Dict{Any,Int}
updated::Int
end
Keeps statistics about scalar values of a one key and also about items inside a key
counts
counts how many times given value appeared (at most max_keys() is held)updated
counts how many times the entry was updated
JsonGrinder.ArrayEntry
— Typemutable struct ArrayEntry <: JSONEntry
items
l::Dict{Int,Int}
updated::Int
end
keeps statistics about an array entry in JSON.
items
is typeofEntry
or nothing and keeps statistics about the elements of the arrayl
keeps histogram of message lengthupdated
counts how many times the struct was updated.
JsonGrinder.MultiEntry
— Typemutable struct MultiEntry <: JSONEntry
childs::Vector{Any}
end
support for JSON which does not adhere to a fixed type. Container for multiple types of entry which are observed on the same place in JSON.
JsonGrinder.DictEntry
— Typemutable struct DictEntry <: JSONEntry
childs::Dict{String, Any}
updated::Int
end
keeps statistics about an object in json
childs
maintains key-value statistics of childrens. All values should be JSONEntriesupdated
counts how many times the struct was updated.