In this section, we present background information needed for better understanding of what prompted our work, the process underwent by the Debian project that prompted for a fuller analysis to gain understanding of the keyring itself.
Throughout this section, a set of Research Questions (RQs) are presented, which guide the discussion that follows.
2.1 Trust models in public key cryptography
Besides encryption and signing, public key cryptography provides several models for identity assessment, called trust models. The most widespread model is the Public Key Infrastructure (PKI) model, a hierarchical model based on predetermined roots of trust and strictly vertical relationships (certificates) from Certification Authorities (CAs) to individuals. This model is mostly known for being the basis for the ssl and tls protocols, providing among others secure communication between Web browsers and servers using the https protocol.
As we have presented [9], the Debian project, being geographically distributed and with no organizational hierarchy, bases its trust management upon the Web of Trust (WoT) model, with an extra step we have termed curatorship. The WoT model has been an integral part of OpenPGP since its inception [12]. For this model, there is no formal distinction between nodes in the trust network: all nodes can both receive and generate certificates (or, as they are rather called in the WoT model, signatures) to and from any other node, and trust is established between any two nodes that need to assert it by following a trust path that hopefully links them in the desired direction and within the defined tolerable distance [11]. This leads to the first research question this work attempts to answer:
RQ1 Being Debian such a long-lived project, how does its trust model endure time? Does aging qualitatively challenge it?
Beside the aforementioned work, several other works have studied the information that can be gathered from the total keyring in the SKS keyserver networkFootnote 1 [13]. The work we will present in this paper is restricted to a small subset thereof - As of December 2016, the SKS network holds over 4 million keys, while the active Debian keyrings hold only around 1500.
2.2 Cryptographic strength
Public key cryptography works by finding related values (typically, very large prime numbers). The relation between said numbers, thanks to mathematical problems that are hard enough to solve to be unfeasible to be attacked by brute force, translates to the strength of the value pair.
Over the years since the public invention and publicationFootnote 2 of public key cryptography [14], several algorithms for finding and relating said numbers have been incorporated into the Digital Signature Standard [15]; currently, the most widely used are RSA (based on the integer factorization problem; [16]) and DSA (based on the discrete logarithm problem; [17]).
Said schemes’ strength is directly related to the size of the numbers they build on. Back in the 1990s, when Internet connectivity boomed and they first became widely used [12], key sizes of 384 through 1024 bits were deemed enough; using longer keys demanded computing resources beyond what was practical at the time.
Of course, computers become more powerful constantly; cryptographic problems that were practically unsolvable 10 or 20 years ago are now within the reach of even small organizations ([10], p. 11). Cryptographic keys used for RSA and DSA algorithms should now be at least 2048 bits, with 4096 becoming the norm.
By 2009 (when the need to migrate to stronger keys was first widely discussed within the Debian project) the amount of 1024-bit keys was close to 90% of the total keyring; the upcoming need of migration was repeatedly discussed, and due to the threat of an attack becoming feasible for a medium-sized organization ([10], pp. 30,32), by July 2014 a hard cutoff line for expiring keys shorter than 2048 bits was set for January 2015, setting a six month period for key migration. We published a analysis on that migration process [9], which prompted the present work.
2.2.1 Cryptographic certificates in the Debian project
Not many free software projects started in the 1990s are still active today, but those that are tend to be very large and important. One such case is Debian; as mentioned in Section 1, the project was founded in 1993. Although the vast majority of its developers did not join until many years later, as we will explain in Section 5, many developers have been active for over a decade.
Being Debian a globally distributed project, where any project member is trusted to perform unsupervised uploads that will ultimately be installed and executed in millions of computers worldwide, the needed level of trust in a member’s identity clearly surpasses what the traditional username-password pair offers; Debian Developers have used the cryptographic signature as their means of authentication to project services since its early days ([18], pp. 18–20).
Even more, key signing parties (KSPs, sessions where each participant verifies the other participants’ identity, to later produce a cryptographic certificate or signature of the identity, thus strengthening the WoT, further studied in Section 3) ([18], p. 11) have been a long-standing tradition and are acknowledged as a social ties building event at developer conferences and gatherings.
Exchanging key signatures can be a challenging event for newcomers to a community, as can be seen following the exploration and proposal by [19]. Even within a community as tech-savvy as Debian is, we feel it important to understand how useful and how effective KSPs are. Thus the following research question:
RQ2 What is the actual effectivity of KSPs for Debian? Are they worth fostering and keeping, or should an alternative trust-building model be sought?
A long-time, socially active developer’s key can often be signed by hundreds of people, and the more signing activity a given key has, the more central it becomes to the WoT (it becomes a trust hub).
While key migration pace did see a strong increase past July 2014, full project participation was effectively cut for 252 developers - that is, about a fourth of the project. Two and a half years later, there were still 167 keys marked as removed that have not been acted upon. We analyze this process at [9]; for the present work, suffice it to say that analyzing this migration process was instrumental in the analysis to be presented.
Our hypothesis is that, even considering the global dispersion of the project, the removed keys mostly belong to people who had already drifted away from their project engagement and were inactive; the upcoming Section 4 discusses how this can be understood (and even predicted) from the WoT, even analyzing it years before the migration took place; social practice in Debian makes it hard to determine when a developer is no longer active; although there is a formal process for following up seemingly-inactive developers, [20, 21] given the high amount of human work it requires, it has so far not reached enough coverage.
A process enhancement, automating a good part of the needed follow-up and providing a simple interface for inactive developers to signal they are effectively inactive, has been recently enabled [22]; this change is too recent to be accurately reported, but during the first month after its implementation, it has led to 20 developers to acknowledge they are no longer active in the project. Sixteen of them had 1024-bit keys, which means they had been inactive in most substantive project activitiesFootnote 3 for at least two and a half years already.
This process brings up yet another question: Given that both due to challenges brought up by advances regarding cryptographic strength, and by shifts in priorities or time availability in the lives of the members of the project will most likely continue to create fluctuations in each person’s interactions with the project, can anything be learnt from past behaviour to help it cope with future fluctuations? Hence,
RQ3 From the data gathered, processed and presented as part of this work, what insights on future behaviour of the keyrings be found via statistical means?
2.3 Threats to validity
This article is based exclusively on the Debian Curated Web of Trust, it does not relate to or cover any other project’s keyring. This is mainly because, to the best of our knowledge, there just is no other project which implements a CWoT in a similar fashion. As we explain in Section 2.2.1, the practice of exchanging key signatures is strongest in the Debian project, it does exist in other free software communities, but not with the same strength exhibited in Debian. Even just by sheer size, the footprint of @debian.org mail addresses in the SKS network is larger than most countries [13].
As for other groups that could be comparable, we could find the image of a community in several free software projects (such as Tor, Fedora, OpenBSD). However, said communities do not use a keyring as an integral part of their infrastructure. That is, there is no curation process to them, and access is not granted based on whether an individual presents a key that belongs to a given keyring.
It should be noted, the authors have started talking with a well recognized free software development project, which will possibly adopt curation and privilege-granting processes similar to Debian’s. We do not want to commit them, so we have chosen not to name them.
We have been approached with questions regarding the analysis of the keyring blobs described in Section 4. Graphically interpreting a graph such as the ones prompting this study (Fig. 3) might not be meaningful; the shape of the blob itself could be an artifact of specific nodes ordering. In order to address this question, we tried reversing and randomizing the nodes in the graphviz source files, and found our observations to be sustained. We also switched the rendering engine to the JavaScript-based visjs, and found it to be stable. However, the analysis is still ocular; we have not performed any numerical analysis that can confirm our hypothesis.
As for Fig. 4, a similar question arises when considering overplotting: Are the colors we see a faithful representation, or is there hidden information underneath? Even more, is the color choice correct? As we mention in Section 4, some colors are more visible than others. For this question, we also compared the resulting plots to plots done with the edges presented in different order and with different colors; the results are coherent with what we present.