By Neal | July 8, 2019

The problem seems basically unfixable, and oh god, of course the reason involves unmaintained academic code written in OCaml. pic.twitter.com/aScg3zns1C
— Matthew Green (@matthew_d_green) June 29, 2019

That's just how it works. You don't get special treatment because you're a nice, volunteer-run project with a legacy codebase from the 90s. If you're unable to fix your issues then these things will happen.
— hanno (@hanno) July 1, 2019

Background

A bit more than a week ago, someone added hundreds of thousands of signatures to rjh’s and dkg’s OpenPGP keys, and uploaded them to the SKS key server network ¹. This vandalism is annoying. Now whenever someone downloads these keys from an SKS key server, they have to download tens of megabytes of data instead of tens of kilobytes.

Unfortunately, this certificate flooding attack can’t be prevented without rearchitecting SKS ². SKS delivers third-party certificates with the signed key, which allows allows anyone to flood anyone else’s key by adding lots of third-party certifications to it. Because this attack exploits an architectural flaw, it is basically impossible to fix SKS in a backwards compatible way. This partially explains why this problem hasn’t been fixed even though the OpenPGP community has been known about for years and an attack was published in 2013.

That’s the bad news. The horrible news is that these flooded keys incapacitate GnuPG. GnuPG not only chokes when trying to work with them, but if a flooded key is present in the user’s key ring, it slows down operations using other keys by orders of magnitude. And, the slow down is not only noticeable, but, as dkg reports, it is causing users to question whether their setup is broken:

[F]rom several conversations i’ve had over the last 24 hours, i know personally at least a half-dozen different people who […] have lost hours of work, being stymied by the failing tools, some of that time spent confused and anxious and frustrated. Some of them thought they might have lost access to their encrypted e-mail messages entirely. Others were struggling to wrestle a suddenly non-responsive machine back into order.

The Reaction

Unfortunately, the reaction from GnuPG’s unofficial crisis communicator, rjh, has been to blame the attacker for hurting the OpenPGP ecosystem. Now, we—Justus, Kai and I—agree that the attack was irresponsible, but the OpenPGP ecosystem isn’t a shared apple press in the village square. OpenPGP is a security standard whose aim is to protect its users from malicious state actors. The tooling ought to be resilient to such attacks. And, if not, the community should take responsibility for their failings. rjh’s response was the opposite.

We’re writing this blog post to tell you that we feel responsible to OpenPGP’s users. And we didn’t just start feeling responsible when these attacks happened, but we, with several others, have been working to improve OpenPGP tooling for some time now.

So, when the usual prominent voices in the broader community reiterate their hope that PGP dies, we want to reassure you that there are people working to improve the OpenPGP ecosystem. Yes, the tooling needs to improve. But no, we don’t think we have to throw out OpenPGP in favor of unstandardized, centralized tools. And yes, we are confident that we can build resilient, and federated encryption tools on top of OpenPGP.

Doing Something About It

Nearly two years ago, Justus, Kai and I with the financial support of the p≡p Foundation started the Sequoia project. Sequoia is a project whose main technical goal is to not reimplement GnuPG, but to rethink the entire OpenPGP ecosystem from low-level technical issues like how to create a safe and easy-to-use API for applications to work with OpenPGP data to high-level people problems like key certification and trust models. But more than technical goals, we have social goals. We want to build an inclusive community around our project.

It’s easy to look at a project as an outsider and say: I can do it better. In my experience, this stance is often hubris arising from half-baked knowledge. But, that’s not how we started the Sequoia project. Justus, Kai and I each worked on GnuPG for over two years as employees of g10code. During that time, we gained experience with the GnuPG code base, and also interacted with many GnuPG users and learned both how GnuPG satisfied their needs and how it didn’t. For various reasons, two years ago, we felt that we could more effectively work outside of the GnuPG project, which is how the Sequoia project started.

In Sequoia, our initial technical focus has been on creating a secure, resilient, and usable OpenPGP implementation and API. We concluded that evolving GnuPG (like NeoPG decided to do) would be too hard. And, we wanted to move away from C to a memory-safe language like Rust.

Along the way, we’ve started several side projects to evaluate the usability of Sequoia’s API. One of those projects was Hagrid, a key server. Thanks primarily to Vincent Breitmoser, a co-maintainer of Open Keychain and K-9 Mail, we’ve grown Hagrid from an experiment to a real tool, which, like Sequoia, doesn’t simply clone the functionality of an existing program, but rethinks its architecture. Today, Hagrid is deployed at keys.openpgp.org.

Sequoia

Although we were aware that keys could be flooded, we didn’t explicitly prepare Sequoia for the certificate flooding attacks that recently occurred. As we built Sequoia, we simply tried to make it resilient to abuse.

Given this, I was pleasantly surprised when I measured how long it takes Sequoia to process rjh’s flooded key. Importing it with its nearly 150.000 signatures into Sequoia’s store took 5.1 seconds on my Lenovo x250 laptop, and encrypting to it took 1.1 seconds.

Relatively speaking, that’s slow. Sequoia takes over an order of magnitude longer to work with rjh’s flooded key than a normal key: importing a normal key takes about 100 ms as does encrypting to it.

Seen absolutely, the slowdown isn’t a disaster. First, yes, importing and using flooded keys causes a minor inconvenience for users, but its not one that is likely to cause users to doubt the security of their system. It’s a hiccup. But, second, unlike with GnuPG, importing a large key into Sequoia doesn’t measurably impact operations using other keys. Consider:

$ ls -lh /tmp/rjh.gpg
-rw-r--r-- 1 us us 60M Jul  1 10:42 /tmp/rjh.gpg
$ time /tmp/sequoia-build/release/sq store import rjh /tmp/rjh.gpg

real    0m5.121s
user    0m4.278s
sys     0m0.695s
$ time bash -c 'echo foo | /tmp/sequoia-build/release/sq encrypt -r rjh > /dev/null'

real    0m1.146s
user    0m0.772s
sys     0m0.394s
$ time /tmp/sequoia-build/release/sq store import neal ~/neal.asc

real    0m0.113s
user    0m0.020s
sys     0m0.090s
$ time bash -c 'echo foo | /tmp/sequoia-build/release/sq encrypt -r neal > /dev/null'

real    0m0.109s
user    0m0.028s
sys     0m0.084s

Hagrid, an SKS Replacement

Once we decided to make Hagrid an SKS replacement, we chose to initially concentrate on two issues: performance, and UX.

Performance

These days, using the SKS key server network to look up a key typically takes seconds. And, it is not unusual for the look up to just time out. Improving this is a matter of basic software engineering (and has been done before by, for instance, Hockeypuck). As such, there is nothing special to report here.

Usability

The bigger problem has to do with SKS’s usability. Many SKS users incorrectly assume that the key servers are an authenticated directory similar to a telephone book. In reality, they are just an append-only log with a bit of structure. Fixing problems like these requires rethinking SKS’s design.

To improve SKS’s usability, we decided to make Hagrid better match users’ expectations that key servers are curated. To do this, we settled on making Hagrid a verifying key server similar to the PGP Global Directory and the Mailvelope key server. Ideally, we’d prefer that key servers, even a verifying key server, not be used for key discovery, because it centralizes trust. But, so far, alternatives like p≡p, Autocrypt, and WKD have not seen wide deployment. Hence, until there are widely deployed alternatives, we think it is essential to continue to offer a similar service to OpenPGP users.

Even when these technologies are deployed, key servers will remain useful: although key servers are abused for key discovery, their primary usefulness is as a mechanism for obtaining certificate updates like key revocations, new subkeys, etc. From a security or an operational perspective, keeping keys up to date is essential. To cater to this use case, Hagrid, unlike existing verifying key servers, serves non-Personally Identifying Information (PII) even for key that don’t have any verified User IDs. This is possible, because OpenPGP implementations locate these updates using a key’s fingerprint, and not an email address.

These architectural changes are disruptive to the API and current workflows. Although we think our ideas are good, we want user feedback. In particular, we want to know whether users actually want a service like Hagrid. As such, we decided to release Hagrid even though it doesn’t yet support federation nor does it yet distribute third-party certificates.

Federation

Implementing federation in Hagrid is complicated, because we don’t want to synchronize User IDs with arbitrary peers: a central tenant of Hagrid is to give a key’s owner control over the Personally Identifying Information (PII). If anyone could peer, then they could, for instance, ignore deletion requests. Currently, we are considering two approaches to federation.

First, we could use a closed federation model. This allows us to not only synchronize PII, but also ensures that all peers provide a minimum quality of service. Also, because peers are trusted, it is easier to implement.

Alternatively, we could only federate non-PII data, and have a more open federation model. This approach makes sense from a performance perspective since we expect most load to come from applications periodically refreshing keys, and not from key discovery, which is relatively infrequent. This approach also makes sense from a trust perspective: it is more important that certificate updates be decentralized than the authenticated directory. Decentralizing certificate updates makes it harder for a single server to withhold, say, a revocation certificate. The authenticated directory, on the other hand, is inherently centralized, because some third-party defines what it means for a User ID to be verified. One issue with this approach is that since any peer can enumerate all keys, it can query the authenticated directory to figure out what keys have been validated. It’s unclear to what degree this is a real problem.

Third Party Certificates

When designing Hagrid, we explicitly decided not to return third-party signatures along with the signed key to prevent certificate flooding attacks. We are currently considering two main alternatives.

The first approach is to have the signee explicitly acknowledge any third-party signatures. Then an adversary can’t flood a certificate, because the victim can simply ignore the signatures. This requires improved tooling, and further complicates already complicated workflows. But, arguably, the only people who are actually signing keys today would be capable of understanding this, and incorporating it into their workflow.

Alternatively, instead of distributing signatures with the signee’s key, they could optionally be distributed with the signer’s key. This mitigates the certificate flooding problem, because no one would look up the attacker’s key. A downside to this approach is that it reveals a user’s trusted introducers to the key server operator, because normally only signatures issued by trusted introducers are interesting.

A Work in Progress

In the hopefully near future, Hagrid will add support for both federation and the distribution of third-party certificates. This isn’t a secret. It’s in Hagrid’s FAQ. Hopefully, this post has made clear that we aren’t trying to centralize OpenPGP or kill the web of trust; we are just trying to be thoughtful in our approach.

Maintainability

As a final note, Hagrid was initially developed by the Sequoia team. Vincent became active in the design discussions early on and organized the infrastructure to host a Hagrid instance on keys.openpgp.org. This past February, he began actively contributing code, and has since become Hagrid’s main developer. We are extremely happy that Vincent has taken over the project, and we view it as a success for Sequoia’s API that he was able to do so so quickly.

Other Improvements

We aren’t the only ones who are working to improve the OpenPGP ecosystem. Other efforts include p≡p and Autocrypt. (Full disclosure: the Sequoia team is financed by the p≡p foundation.)

These projects have a number of outstanding people contributing to them who are trying hard to avoid more of the same usability difficulties that have plagued the OpenPGP ecosystem for decades.

Conclusions

The OpenPGP ecosystem has—and has had for several years—a latent tooling crisis. The Sequoia project is trying to improve the ecosystem. And, we feel that the current crisis validates our approach: neither Sequoia nor Hagrid are adversely impacted by certificate flooding. But, we are not the only ones quietly working to improve OpenPGP tooling.

Unfortunately, prominent members of the OpenPGP community have tended to attack the messengers. We think a solution-oriented approach is better.

We are convinced that OpenPGP is worth maintaining and evolving. Not because OpenPGP has existed for so long, but because the standard is on the whole a good one, and we believe it is important and good to have an option for standardized and decentralized end-to-end encryption in the larger ecosystem of privacy enhancing tools.

Thanks to Heiko for discussing and improving this text.

Update 2023: The domain sks-keyservers.net is no longer available. ↩︎
Update 2023: The sources for SKS are no longer available on BitBucket, but are still available on GitHub. ↩︎

Certificate Flooding, SKS and GnuPG Issues, and the Sequoia Project