By Neal | May 6, 2021
I was recently talking to a Thunderbird developer about API design. In the course of that conversation, I expressed concerns about RNP, the new OpenPGP implementation that Thunderbird has recently started using in place of GnuPG. That person, skeptical about my assertion that RNP’s API needs improvement, asked “Isn’t it subjective what a better API is?” I’d agree that we don’t yet have good metrics to evaluate an API. But, I disagree that we can’t judge APIs at all. In fact, I suspect, most experienced programmers know a bad API when they see it. Further, I think we can come up with some good heuristics, which I’ll try to do based on my experience working on and with GnuPG, Sequoia, and RNP. Then, I’ll take a look at RNP’s API. Unfortunately, RNP’s API is not only easy to misuse, but it’s misleading, and, as such, shouldn’t yet be used in a safety-critical context. Yet, Thunderbird is relied on by vulnerable people like journalists, activists, lawyers, and their communication partners who need this protection. For me, this means that Thunderbird should reevaluate their decision to use RNP.
Note: please also see this related mail, Let’s Use GPL Libraries in Thunderbird!, which I sent to Thunderbird’s Planning Mailing List.
What Makes a Bad API?
Prior to starting the Sequoia project with Justus and Kai, the three
of us worked together on GnuPG. In addition to hacking on gpg
, we
also spoke to and collaborated with a lot of gpg
’s downstream users.
People had a lot of good things to say about GnuPG.
Two criticisms of gpg
’s API stood out for us. The first criticism
can be distilled down to: gpg
’s API is too opinionated. For
instance, gpg
has a keyring-centric approach. This means that it is
only possible to use or examine an OpenPGP certificate if it has been
imported into the keyring. But some developers only want to import a
certificate after they’ve examined it. For instance, when looking up
a certificate on a key server by fingerprint, it is possible to check
that the returned certificate is the right one, because the URL is
self authenticating. It is possible to do this with gpg
, but it
requires working around gpg
’s programming model. The basic idea is
the following: create a temporary directory, add a configuration file,
tell gpg
to use the alternate directory, import the certificate
there, examine the certificate, and clean up the temporary directory.
That’s the official suggestion, which Justus added based on our
conversations with gpg
’s downstream users. Yes, it works. But, the
approach requires operating system-specific code, is slow, and error
prone.
The other criticism that I heard repeatedly is that using gpg
requires a lot of arcane knowledge to avoid misusing it. Or, put
differently, one has to be extremely careful when using gpg
’s API
to not inadvertently introduce a vulnerability.
To better understand this second concern, consider the EFAIL
vulnerabilities. The basic problem is around gpg
’s decryption API:
when decrypting a message, gpg
emits the plaintext even if the input
has been corrupted. gpg
does return an error in that case, but some
programs display the corrupted plaintext anyway. Because, why not?
Surely showing part of the message is better than nothing, right?
Well, the EFAIL vulnerabilities demonstrate how an attacker can use
this to insert a web bug into an encrypted message, and when the
user views the message, the web bug exfiltrates the message. Ouch.
So, who’s responsible for the bug? The GnuPG developers
argued that the applications used gpg
wrong:
MUAs are advised to consider the DECRYPTION_FAILED status code and not to show the data or at least use a proper way to display the possible corrupted mail without creating an oracle and to inform the user that the mail is fishy.
gpg
signaled an error; the applications didn’t adhere to the API
contract. I have to agree with the GnuPG developers, and add: gpg
’s
interface was (and remains) a disaster waiting to happen, because it
doesn’t guide the user to do the right thing. On the contrary, the
easy, seemingly helpful thing is the wrong thing to do. And, this
type of API is unfortunately common in GnuPG.
What Makes a Good API?
These two realizations—that gpg
’s API is too opinionated, and is
hard to use right—were formative for me. When we started the
Sequoia project, we agreed that we wanted to avoid making similar
mistakes. Based on these observations, we adopted two tests that we
continue to use to guide the development of Sequoia’s API. First,
there should be a low-level API in addition to any high-level API,
which is unopinionated in the sense that it doesn’t prevent the user
from doing anything legitimate. Simultaneously, an API should guide
the user to do the right (opinionated) thing by making the right thing
the easy, and obvious thing to do.
To realize these two, slightly conflicting goals of enabling everything, but preventing mistakes, we leaned on two tools in particular: types, and examples. Types make it hard to use an object in an inappropriate way by formalizing the API contract at compile time, and even forcing particular transformations. And, examples—code snippets—will be copied. So, good examples will not only teach users how to use a function correctly, but strongly influence how they use it.
Types
I want to present an example of how we use types in Sequoia to help us make a good API. To understand the example, a tiny bit of background knowledge about OpenPGP is useful.
There are several fundamental data types in OpenPGP. Three are:
Certificates, components such as keys and User IDs, and Binding
Signatures. The root of a certificate is the primary key, which fully
determines a certificate’s fingerprint (fingerprint = Hash(primary key)
). A certificate usually includes components like subkeys and
User IDs. OpenPGP binds a component to the certificate using a
so-called binding signature. Making the fingerprint just the hash of
the primary key and using signatures to bind the components to the
primary key means that it is possible to add additional components
later. Binding signatures also include properties. This makes it
possible to change a component, e.g., to extend a subkey’s expiration.
A consequence of this is that there can be multiple valid signatures
associated with a given component. Binding signatures are not only
fundamental, but also an integral part of OpenPGP’s security.
Because there can be multiple valid binding signatures, we need a way to choose the right one. As a first approximation, the right signature is the latest, non-expired, non-revoked, valid signature, which was not created in the future. But what is a valid signature? In Sequoia, the signature does not only need to check out mathematically, it needs to be consistent with a policy. For instance, due to its compromised collision resistance, we only allow SHA-1 in a very limited set of circumstances. (Paul Schaub, who works on PGPainless, recently wrote about these complexities in detail.) Forcing the user of the API to keep all of these concerns in mind invites vulnerabilities. In Sequoia, the easy way to get the expiration time is the safe way. Consider this code, which does the right thing:
let p = &StandardPolicy::new();
let cert = Cert::from_str(CERT)?;
for k in cert.with_policy(p, None)?.keys().subkeys() {
println!("Key {}: expiry: {}",
k.fingerprint(),
if let Some(t) = k.key_expiration_time() {
DateTime::<Utc>::from(t).to_rfc3339()
} else {
"never".into()
});
}
cert
is a certificate. We start by applying a policy to it.
(Policies are user definable, but normally the StandardPolicy
is
not only sufficient, but most appropriate.) This effectively creates
a view of the certificate where only components with a valid binding
signature are visible. Importantly, it also modifies and exposes a
number of new methods. The keys
method, for instance, has been
modified to return a ValidKeyAmalgamation
instead of a
KeyAmalgamation
. (It’s an amalgamation, because it includes not
only the Key
, but also any associated signatures; some people
thought Katamari would have been a better name. ¯\_(ツ)_/¯
) A
ValidKeyAmalgamation
has a valid binding signature according to the
above criteria. And, it exposes methods like key_expiration_time
,
which only make sense on a valid key! Also note:
key_expiration_time
’s return type is ergonomic. Instead of
returning the raw value, key_expiration_time
returns a
SystemTime
, which is safe and easy to work with.
Consistent with our first principle of enabling everything, a developer could still access the individual signatures and examine the subpackets to get the key’s expiration time from a different binding signature. But, compared with the right way to get the key’s expiration time using Sequoia’s API, they would have to go out of their way to do it differently. In our opinion that’s a good API.
Examples
We released v1.0 of the Sequoia library in December of 2020. Nine
months prior to that, we were feature complete and ready to release.
But, we waited. We spent the following nine months adding
documentation and examples to the public API. Take a look at the
documentation for the Cert
data structure to see an example of the
results. As described in the blog post, we didn’t quite manage to
provide an example for every function, but we did get pretty far.
And, as a side effect of writing the examples, we identified some
rough spots, which we polished.
Since the release, we’ve had contact with a number of developers who have integrated Sequoia into their code. A common refrain is how helpful the documentation and examples are. And, we can confirm: even though it is our own code, we reference the documentation almost every day, and copy our own examples. It’s just easier. And, since the examples show how to correctly use the function, why redo the work from scratch?
RNP’s API
RNP is a young OpenPGP implementation developed primarily by Ribose. About two years ago, Thunderbird decided to integrate Enigmail into Thunderbird and simultaneously replace GnuPG with RNP. That Thunderbird has selected RNP is not only an endorsement of RNP, but it means that RNP became perhaps the most used OpenPGP implementation for mail encryption.
A critique can easily be interpreted as being negative. I want to be absolutely clear that I think the work that Ribose is doing is good and important, and I am thankful that they are investing time and resources into a new OpenPGP implementation. The OpenPGP ecosystem desperately needs more diversity. But, that is not an excuse to use an immature product in a safety-critical context.
Safety-Critical Infrastructure
Unfortunately, RNP is not yet at a point where I think it can be safely deployed. Enigmail was used not only by people worried about their privacy, but also by journalists, activists, and lawyers who are worried about their safety and the safety of their communication partners. In an interview with Benjamin Ismaïl, the head of the Asia-Pacific office at Reporters without Borders, in 2017, he said:
We primarily use GPG to freely communicate with our sources. The information they give us about human rights and the violations that they are subjected to are sensitive information, and it is necessary for them to protect their conversations.
Interview with Benjamin Ismaïl from Reporters without Borders
As such, it is essential that Thunderbird continue to provide these users with the safest experience possible even during this transition period.
RNP and Subkey Binding Signatures
When talking about how we use types in Sequoia to make it harder to
misuse the API, I showed how to get a key’s expiration time in a few
lines of code. I want to start by showing how someone who isn’t an
OpenPGP or RNP expert might implement the same functionality using
RNP. The following code iterates over a certificate’s (key
) subkeys
and prints each subkey’s expiration time. Recall: the expiration time
is stored on the subkey’s binding signature, and a value of 0
means
the key does not expire.
int i;
for (i = 0; i < sk_count; i ++) {
rnp_key_handle_t sk;
err = rnp_key_get_subkey_at(key, i, &sk);
if (err) {
printf("rnp_key_get_subkey_at(%d): %x\n", i, err);
return 1;
}
uint32_t expiration_time;
err = rnp_key_get_expiration(sk, &expiration_time);
if (err) {
printf("#%d (%s). rnp_key_get_expiration: %x\n",
i + 1, desc[i], err);
} else {
printf("#%d (%s) expires %"PRIu32" seconds after key's creation time.\n",
i + 1, desc[i],
expiration_time);
}
}
I tested this code against a certificate with five subkeys. The first subkey has a valid binding signature, and doesn’t expire; the second has a valid binding signature, and expires in the future; the third has a valid binding signature, and is already expired; the fourth has an invalid binding signature, which says that the subkey expires in the future; and, the fifth does not have a binding signature at all. Here’s the output:
#1 (doesn't expire) expires 0 seconds after key's creation time.
#2 (expires) expires 94670781 seconds after key's creation time.
#3 (expired) expires 86400 seconds after key's creation time.
#4 (invalid sig) expires 0 seconds after key's creation time.
#5 (no sig) expires 0 seconds after key's creation time.
The first thing to notice is that the call to rnp_key_get_expiration
succeeds whether the subkey has a valid binding signature, has an
invalid binding signature, or even doesn’t have a binding signature at
all! Reading the documentation, this behavior is a bit surprising.
It says:
Get the key's expiration time in seconds.
Note: 0 means that the key doesn't expire.
Since the key’s expiration time is stored on the binding signature, I,
an OpenPGP expert, understand this to mean that the call to
rnp_key_get_expiration
would only succeed if the subkey has a valid
binding signature. Instead, it appears that if there is no valid
binding signature, the function simply defaults to 0
, which given
the note, the user of the API would justifiably interpret as meaning
the key doesn’t expire.
To improve this code, it is necessary to first check whether the key
has a valid binding signature. Some functions to do this were
recently added to RNP to address CVE-2021-23991. In particular, the
RNP developers added the function rnp_key_is_valid
to return
whether a key is valid. This addition is an improvement, but it
requires the developer to opt-in to these safety-critical checks, not
opt-out, as they would if they were using Sequoia. Since safety
checks are non-functional, they are easy to forget: the code appears
to work even if the safety check is forgotten. And since knowing what
to check requires expert knowledge, they will be forgotten.
The following code includes the safety check and skips any keys that
rnp_key_is_valid
considers to be invalid:
int i;
for (i = 0; i < sk_count; i ++) {
rnp_key_handle_t sk;
err = rnp_key_get_subkey_at(key, i, &sk);
if (err) {
printf("rnp_key_get_subkey_at(%d): %x\n", i, err);
return 1;
}
bool is_valid = false;
err = rnp_key_is_valid(sk, &is_valid);
if (err) {
printf("rnp_key_is_valid: %x\n", err);
return 1;
}
if (! is_valid) {
printf("#%d (%s) is invalid, skipping.\n",
i + 1, desc[i]);
continue;
}
uint32_t expiration_time;
err = rnp_key_get_expiration(sk, &expiration_time);
if (err) {
printf("#%d (%s). rnp_key_get_expiration: %x\n",
i + 1, desc[i], err);
} else {
printf("#%d (%s) expires %"PRIu32" seconds after key's creation time.\n",
i + 1, desc[i],
expiration_time);
}
}
The output is:
#1 (doesn't expire) expires 0 seconds after key's creation time.
#2 (expires) expires 94670781 seconds after key's creation time.
#3 (expired) is invalid, skipping.
#4 (invalid sig) is invalid, skipping.
#5 (no sig) is invalid, skipping.
The code correctly skips the two keys that don’t have a valid binding signature, but it also skips the expired key, which is probably not what we want although the documentation does warn us that this function “checks … expiration times”.
Although there are cases where we don’t want to use a key or
certificate if it is expired, sometimes we do. For instance, if a
user forgets to extend a subkey’s expiration time, they should be able
to see that the subkey is expired when examining the certificate, and
be able to extend the expiration. Although gpg --list-keys
doesn’t
show expired keys, when editing a certificate, it does show subkeys
that are expired so the user can extend their expiry:
$ gpg --edit-key 93D3A2B8DF67CE4B674999B807A5D8589F2492F9
Secret key is available.
sec ed25519/07A5D8589F2492F9
created: 2021-04-26 expires: 2024-04-26 usage: C
trust: unknown validity: unknown
ssb ed25519/1E2F512A0FE99515
created: 2021-04-27 expires: never usage: S
ssb cv25519/8CDDC2BC5EEB61A3
created: 2021-04-26 expires: 2024-04-26 usage: E
ssb ed25519/142D550E6E6DF02E
created: 2021-04-26 expired: 2021-04-27 usage: S
[ unknown] (1). Alice <alice@example.org>
There are other situations where an expired key shouldn’t be considered invalid. For instance, let’s say Alice sends Bob a signed message: “I will pay you 100 Euros in a year,” and the signing key expires in six months. When the year is over, does Alice owe Bob the money on the basis of the signature? I’d say yes. The signature was valid when it was made. The fact that the key expired is irrelevant. Of course, once a key has expired, signatures made after the expiration should be treated as invalid. Likewise a message should not be encrypted with an expired key.
In short, whether a key should be considered valid is highly dependent
on the context. rnp_key_is_valid
is better than nothing, but,
despite its name, it isn’t sufficiently nuanced to generally determine
whether a key is valid.
The same commit introduced a second function, rnp_key_valid_till
.
This function returns “the timestamp till which the key can be
considered as valid… If the key was never valid then a zero value
will be [returned].” We can use this function to determine whether a
key was ever valid by checking whether this function returns a
non-zero value:
int i;
for (i = 0; i < sk_count; i ++) {
rnp_key_handle_t sk;
err = rnp_key_get_subkey_at(key, i, &sk);
if (err) {
printf("rnp_key_get_subkey_at(%d): %x\n", i, err);
return 1;
}
uint32_t valid_till;
err = rnp_key_valid_till(sk, &valid_till);
if (err) {
printf("rnp_key_valid_till: %x\n", err);
return 1;
}
printf("#%d (%s) valid till %"PRIu32" seconds after epoch; ",
i + 1, desc[i], valid_till);
if (valid_till == 0) {
printf("invalid, skipping.\n");
continue;
}
uint32_t expiration_time;
err = rnp_key_get_expiration(sk, &expiration_time);
if (err) {
printf("rnp_key_get_expiration: %x\n", err);
} else {
printf("expires %"PRIu32" seconds after key's creation time.\n",
expiration_time);
}
}
The results are:
#1 (doesn't expire) valid till 1714111110 seconds after epoch; expires 0 seconds after key's creation time.
#2 (expires) valid till 1714111110 seconds after epoch; expires 94670781 seconds after key's creation time.
#3 (expired) valid till 1619527593 seconds after epoch; expires 86400 seconds after key's creation time.
#4 (invalid sig) valid till 0 seconds after epoch; invalid, skipping.
#5 (no sig) valid till 0 seconds after epoch; invalid, skipping.
Now we get the results that we want! We correctly print the expiration time for the first three subkeys, and indicate that the last two subkeys are invalid.
But, let’s take a closer look at rnp_key_valid_till
. First, in
OpenPGP, a key’s expiration time is stored as an unsigned 32-bit
offset from the key’s unsigned 32-bit creation time. Thus, the
function should have used a wider type or at least checked for
overflow. (I reported the issue and it has now been fixed.)
But ignoring that nit, the function remains strange. In OpenPGP a key can be valid during multiple periods. For instance, imagine that a key expires on July 1st and the user only extends the key’s expiration time on July 10th. For the time between July 1st and July 10th, the key was not valid, and any signatures generated during that time should be treated as invalid. So, what should this function return for such a key? More importantly, how should a user of that API interpret the result? And, when is it even appropriate to use this API? (Yes, I asked.)
In Sequoia, we take a different approach. Instead of returning when a
key is valid, we reverse the question: a user of the API can ask: is
this key valid at time t
. In our experience, this is what all of
the cases that we’ve encountered actually need.
I didn’t cherry pick this particular issue with RNP’s API. It’s just an issue that I’ve been thinking about recently. While reimplementing RNP’s API to create an alternative OpenPGP backend for Thunderbird, we encountered many similar issues.
Conclusion
The mistakes that the RNP developers have made are understandable and forgivable. OpenPGP, like many other protocols, is complicated. But, we can’t significantly simplify it if we want to keep its flexible and robust PKI, and not just have a file encryption tool.
Nevertheless, RNP’s API is dangerous. And, Thunderbird is used in security-critical contexts. In an interview in 2017, Michal ‘Rysiek’ Wozniak from the Organized Crime and Corruption Reporting Project (OCCRP) made clear that lives are on the line:
I do strongly believe that had we not been using GnuPG all of this time, many of our sources and many of our journalists, would be in danger or in jail.
Interview with Michal ‘Rysiek’ Wozniak, Organized Crime and Corruption Reporting Project
What are the consequences for Thunderbird? I see three options. First, Thunderbird could switch back to Enigmail. One might think that porting Enigmail to Thunderbird 78 would be hard, but I’ve heard from multiple Thunderbird developers that that this would technically be feasible with manageable efforts. But, one of the reasons that Thunderbird wanted to switch away from Enigmail is the huge amount of time the Enigmail developers spent helping users correctly install and configure GnuPG. So, this way is not ideal.
Second, Thunderbird could switch to a different OpenPGP
implementation. These days, there are a bunch to choose from.
Personally, I think that Thunderbird should switch to Sequoia. Of
course, I work on Sequoia, so I’m biased. But, it’s not somehow about
money: I’m paid by a foundation, and on the open market I would
probably earn twice as much as I’m earning now. For me, it’s about
protecting the users. But, beyond Sequoia’s API and implementation
advantages, it has another advantage for Thunderbird: we already did
the implementation work. A few weeks ago, we released the Octopus,
an alternative OpenPGP backend for Thunderbird. It not only has
feature parity with RNP, but includes a number of oft requested
features like gpg
integration, some security fixes, and a number of
non-functional improvements.
Third, Thunderbird could get out of the OpenPGP business. I don’t want this solution. But, as I’ve said several times, I’m worried about the safety of some of Thunderbird’s most vulnerable users, and I think not providing any OpenPGP support might be safer than the status quo.