How I Learned to Stop Worrying and Love JSON Schema
[TOC]
Intro
This post operates on a few shared assumptions. So, we need to explicitly state them, or otherwise you will read things that are more or less rational but they will appear to be garbage.
- APIs are good
- Many APIs are web APIs
- Many web APIs consume and produce JSON
- JSON is good
- JSON is better if you know what will be in it
So, JSON Schema is a way to increase the number of times in your life that JSON is better in that way, therefore making you happier.
So, let's do a quick intro on JSON Schema. You can always read a much longer and surely better one from which I stole most examples at Understanding JSON Schema. later (or right now, it's your time, lady, I am not the boss of you).
Schemas
So, a JSON Schema describes your data. Here is the simplest schema, that matches anything:
{ }
Scary, uh? Here's a more restrictive one:
{
"type": "string"
}
That means "a thing, which is a string." So this is valid: "foo"
and this isn't 42
Usually, on APIs you exchange JSON objects (dictionaries for you pythonistas), so this is more like you will see in real life:
{
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
That means "it's an object", that has inside it "street_address", "city" and "state", and they are all required.
Let's suppose that's all we need to know about schemas. Again, before you actually use them in anger you need to go and read Understanding JSON Schema. for now just assume there is a thing called a JSON Schema, and that it can be used to define what your data is supposed to look like, and that it's defined something like we saw here, in JSON. Cool?
Using schemas
Of course schemas are useless if you don't use them. You will use them as part of the "contract" your API promises to fulfill. So, now you need to validate things against it. For that, in python, we can use jsonschema
It's pretty simple! Here is a "full" example.
import jsonschema
schema = {
"type": "object",
"properties": {
"street_address": {"type": "string"},
"city": {"type": "string"},
"state": {"type": "string"},
},
"required": ["street_address", "city", "state"]
}
jsonschema.validate({
"street_address": "foo",
"city": "bar",
"state": "foobar"
}, schema)
If the data doesn't validate, jsonchema will raise an exception, like this:
>>> jsonschema.validate({
... "street_address": "foo",
... "city": "bar",
... }, schema)
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "jsonschema/validators.py", line 541, in validate
cls(schema, *args, **kwargs).validate(instance)
File "jsonschema/validators.py", line 130, in validate
raise error
jsonschema.exceptions.ValidationError: 'state' is a required property
Failed validating 'required' in schema:
{'properties': {'city': {'type': 'string'},
'state': {'type': 'string'},
'street_address': {'type': 'string'}},
'required': ['street_address', 'city', 'state'],
'type': 'object'}
On instance:
{'city': 'bar', 'street_address': 'foo'}
Hey, that is a pretty nice description of what is wrong with that data. That is how you use a JSON schema. Now, where would you use it?
Getting value out of schemas
Schemas are useless if not used. They are worthless if you don't get value out of using them.
These are some ways they add value to your code:
- You can use them in your web app endpoint, to validate things.
- You can use them in your client code, to validate you are not sending garbage.
- You can use a fuzzer to feed data that is technically valid to your endpoint, and make sure things don't explode in interesting ways.
But here is the most value you can extract of JSON schemas:
You can discuss the contract between components in unambiguous terms and enforce the contract once it's in place.
We are devs. We discuss via branches, and comments in code review. JSON Schema turns a vague argument about documentation into a fact-based discussion of data. And we are much, much better at doing the latter than we are at doing the former. Discuss the contracts.
Since the document describing (this part of) the contract is actually used as part of the API definitions in the code, that means the document can never be left behind. Every change in the code that changes the contract is obvious and requires an explicit renegotiation. You can't break API by accident, and you can't break API and hope nobody will notice. Enforce the contracts.
Finally, you can version the contract. Use that along with API versioning and voilá, you know how to manage change! Version your contracts.
- Discuss your contracts
- Enforce your contracts
- Version your contracts
So now you can stop worrying and love JSON Schema as well.
Hi Roberto, thanks for the post. I have one question, it's possible nest schemas?
Yes, you can have an element in a schema that is a reference to another one, check this: https://spacetelescope.gith...
You got back to blogging often? Great!
So, what do you think of MyPy?
I am looking forward to playing with it, it seems very interesting.