By Lars | March 2, 2022
Would you like to use Sequoia sq
from your script? We’d like your
feedback.
I’m sketching what the JSON output of sq
might look like. We in the
Sequoia project would like to make sure the JSON serves you well and
is convenient for your code to consume. This blog post outlines the
principles of how JSON output is meant to work, and has a concrete
example of what it’s meant to look like. Your feedback would very much be
appreciated.
Don’t break consumers
The Linux kernel has the guiding principle of “don’t break userland”. If the kernel changes how it behaves in some circumstance, and software running on top of the kernel breaks, the kernel is at fault, even if the old kernel behavior was buggy.
The Sequoia command line tool sq
will produce JSON output. Other
software will consume it. This makes the JSON output an application
programming interface (or API), and as such it needs an interface
contract so that consumers will know what they can rely on not to
change in ways that break them.
I’m proposing the following principles for the JSON API of sq
:
- The JSON output is always a JSON object, not a list or a scalar.
- However, see below about line-oriented JSON.
- There is always a field
sq_schema
that specifies the schema version of the JSON output, as a list of three integers that specify components of a SemVer compatible version number. We’ll update the component as follows:- patch: incremented if there are no semantic changes
- minor: one or more fields were added
- major: one or more fields were dropped
- We won’t ever change the meaning of a field in a given type of JSON
object, regardless of whether it’s the outermost one or nested
inside a field of an outer object: we will always rename the field
instead.
- the outermost JSON object might have a field
packets
, whose value is a list of JSON objects, which have a fieldpacket_type
; the outermost JSON object is one type (“sq packet dump output”), the inner object is another type (“a packet”) - although it’s not evident in the JSON output, each type of JSON
object will be represented in the
sq
source code by a Rust data structure; knowing this may help thinking about types
- the outermost JSON object might have a field
- We won’t re-use the name of a field in a given type of JSON object.
- Consumers shall ignore any fields they don’t use.
This approach should allow us to evolve the schema for the JSON output. Later, when we add other formats, such as YAML, we can use the same approach.
In other words, if a consumer wants a field droids
, and the sq
output contains a fields called droids
, then the consumer can be
sure those are the droids they are looking for. However, the consumer can
also look at the schema version to know which fields it should expect.
The consumer can take the approach that’s easier for them.
The user will be able to choose what version of the schema to output:
we will keep support in sq
for every major version the latest minor
and patch versions. sq
will support 1.2.3, but not 1.0.0, 1.1.0,
1.2.0, 1.2.1, or 1.2.2: a consumer who understands 1.2.0, will
understand 1.2.3. Likewise, a consumer who understands 1.2.0 will also
understand 1.3.0, as long as they ignore any fields added after 1.2.0.
The compatibility rules mean that we can add fields without breaking
consumers, so there’s no need to support all minor versions of a major
version. Thus, if schema version 1.0.0 has a field name
, and we add
a field nickname
, we bump the version to 1.1.0. If we rename
nickname
to petname
, we bump the version to 2.0.0. If we then want
the petname
field to be a list of pet names, we drop the petname
field, add the petnames
field, and bump the version to 3.0.0.
Patch level changes would be changes such as adding constraints on
fields, without otherwise changing the semantics: if version 1.0.0 has
a field name
that is a string, and it just so happens sq
never
sets it to an empty string, version 1.0.1 might add the explicit
constraint that name
is never empty. Patch level changes must never
break compatibility for consumers of sq
JSON output.
sq
will add an option --output-format=FORMAT
, where FORMAT
is json
for now, but will allow for other values later. This will be
a global option, i.e., not specific to a subcommand. If JSON output is
requested, but the subcommand doesn’t support JSON output, the
subcommand will just output what it normally outputs.
sq
will also add the option --output-version=VERSION
, where
version is a string with dotted integers (1
, 1.2
, or 1.2.3
), and
sq
will output that schema version, if it knows it. If VERSION
is
1.2.3
, but sq
only knows 1.2.2
, that’s an error, and sq
won’t
output anything. If no --output-version
is used, sq
will output the
latest version it supports.
The environment variables SQ_OUTPUT_FORMAT
and SQ_OUTPUT_VERSION
will be used if the corresponding options aren’t given by the user.
This allows a consumer to avoid having to add the options to every
invocation of sq
.
sq keyring list
The first command to gain JSON support will be sq keyring list
. The
output will look something like this:
{
"sq_schema": [
0,
0,
0
],
"keys": [
{
"fingerprint": "16F3A23A820810ABA1ADEBBE9B75D81B3D06E8DD",
"primary_userid": "Lars Wirzenius (obnam backups) <liw@liw.fi>",
"userids": [
"Lars Wirzenius (obnam backups) <liw@liw.fi>"
]
}
]
}
The keys
field will be an empty list, if the input doesn’t contain
keys or certificates, respectively. The primary_userid
field is
chosen by sq
. The userids
field always contains all User IDs,
including the primary one.
Note that while sq keyring list
has the option --all-userids
, that
has no effect on JSON output. The textual output is meant for humans,
who find it easier to only see the primary user id if that’s what they
care about. The JSON output is for programs, which don’t mind ignoring
fields.
sq inspect, sq packet dump
Later on, sq
will get JSON support for things like sq inspect
and
sq packet dump
. However, having experimented with adding JSON
support for those, I know it will require a fair bit of internal
infrastructure changes to be doable cleanly. I’d rather start small,
with scaffolding to support JSON at all.
Single JSON object vs line based JSON
For cases when sq
output is very large, writing only a single JSON
object can be wasteful. The consumer needs to use a special streaming
parser to avoid having to construct the whole object in memory. For
memory-constrained consumers this can be a serious problem.
An alternative is to use a line based JSON approach: each output line contains one JSON object. See JSON Lines and its fork ndjson for details.
How large a JSON object should we allow? Should we always use line based JSON, or only when the user requests it? Would line based JSON be significantly harder to consume? Your opinion would be welcome.
See the #734 issue for a discussion about this, and other aspects.
Questions
If we assume you would want to write a script or program to consume
the output of sq
, would the approach I’ve outlined above work for
you? Would you find it convenient to use? If you use a tool such as
jq, would you find it convenient to
consume the sample output above? Do you expect it to be easy to
get things right the first time, or would it be error prone?
Which sq
commands would you most like to have JSON support?
You can email me directly (liw@sequoia-pgp.org
), drop by on the
Sequoia IRC (#sequoia
on OFTC), or leave a comment on the Sequoia
issue tracker (issue
#734).
Note
This work is supported by a grant from the NLnet foundation from the NGI Assure fund, financially supported by the European Council.