Saturday, December 24, 2011

DVCS and change authenticity

In the world of version control, distributed version control systems such as Git and Mercurial are all the rage.

These systems are indeed extremely powerful, but they all suffer from a fundamental issue, which is how the various nodes in the distributed system can establish the necessary trust to verify authenticity of push and pull requests.

(Disclosure: at my day job, we make a version control system, which has a centralized architecture and a wholly different trust and authentication mechanism. So I'm more than just an interested observer here.)

Now, this issue has been known and discussed for quite some time, but it has acquired greater urgency this fall after a fairly significant compromise of the main Linux kernel systems. As Jonathan Corbet notes in that article

We are past the time where kernel developers are all able to identify each other. Locking down kernel.org to the inner core of the development community would not be a good thing; the site is there for the whole community. That means there needs to be a way to deal with mundane issues like lost credentials without actually knowing the people involved.

The emerging proposal to deal with this problem includes several new features in Git:

I suspect that this problem is a deep and hard and fundamental one. It seems to me that the DVCS infrastructure is building a fairly complex mechanism: here's how Linus will use this technology to ensure the integrity of the Linux kernel, as described by Junio Hamano (the lead Git developer):

To make the whole merge fabric more trustworthy, the integration made by his lieutenants by pulling from their sub-lieutenants need to be made verifyable the same way, which would (1) make the number of signed tags even larger and (2) make it more likely somebody in the foodchain gets lazy and refuses to push out the signed tags after he or she used them for their own verification.

But reading this description, I'm instantly reminded of a very relevant observation made by Moxie Marlinspike in the context of the near-complete-collapse of the SSL Certificate Authority chain of trust this spring:

Unfortunately the DNSSEC trust relationships depend on sketchy organizations and governments, just like the current CA system.

Worse, far from providing increased trust agility, DNSSEC-based systems actually provide reduced trust agility. As unrealistic as it might be, I or a browser vendor do at least have the option of removing VeriSign from the trusted CA database, even if it would break authenticity with some large percentage of sites. With DNSSEC, there is no action that I or a browser vendor could take which would change the fact that VeriSign controls the .com TLD.

If we sign up to trust these people, we're expecting them to willfully behave forever, without any incentives at all to keep them from misbehaving. The closer you look at this process, the more reminiscent it becomes. Sites create certificates, those certificates are signed by some marginal third party, and then clients have to accept those signatures without ever having the option to choose or revise who we trust. Sound familiar?

I'm not saying I have the answer; indeed, the very smartest programmers on the planet are struggling intensely with this problem. It's a very hard problem. As the researchers at the EFF recently noted:

As currently implemented, the Web's security protocols may be good enough to protect against attackers with limited time and motivation, but they are inadequate for a world in which geopolitical and business contests are increasingly being played out through attacks against the security of computer systems.

Returning to the world of DVCS systems, for a moment, I've just felt, all along, that the fundamental weakness of DVCS systems was going to turn out to be their weak authenticity guarantees; indeed, this is the core reason that organizations like Apache have been very reluctant to open their infrastructure up to DVCS-style source control, even given all its other advantages.

And it seems like the people who are trying to repair the Certificate Authority technology are also skeptical that a 100% distributed solution can be effective; as Adam Langley says:

We are also sacrificing decentralisation to make things easy on the server. As I've previously argued, decentralisation isn't all it's cracked up to be in most cases because 99.99% of people will never change any default settings, so we haven't given up much. Our design does imply a central set of trusted logs which is universally agreed. This saves the server from possibly having to fetch additional audit proofs at runtime, something which requires server code changes and possible network changes.

And the EFF's Sovereign Keys proposal has a similar semi-centralization aspect:

Master copies of the append-only data structure are kept on machines called "timeline servers". There is a small number, around 10-20, of these. The level of trust that must be placed in them is very low, because the Sovereign Key protocol is able to cryptographically verify the important functions they perform. Sovereign Keys are preserved so long as at least one server has remained good. For scalability, verification, and privacy purposes, lots of copies of the entire append-only timeline structure are stored on machines called "mirrors".

With the new Git technology, as I understand it, the user who accepts a pull request from a remote repository now faces a new challenge:

The integrator will see the following in the editor when recording such a merge:
  • The one-liner merge title (e.g 'Merge tag rusty-for-linus of git://.../rusty.git/');
  • The message in the tag object (either annotated or signed). This is where the contributor tells the integrator what the purpose of the work contained in the history is, and helps the integrator describe the merge better;
  • The output of GPG verification of the signed tag object being merged. This is primarily to help the integrator validate the tag before he or she concludes the pull by making a commit, and is prefixed by '#', so that it will be stripped away when the message is actually recorded; and
  • The usual "merge summary log", if 'merge.log' is enabled.

This will be a challenging task to require of all developers in this chain of trust. Is it feasible? One thing for sure, the Git team are to be commended for facing this problem head on, for openly discussing it, and for trying to push the problem forward. It is exciting to watch them struggle with the issues, and I've learned an immense amount from reading their discussions.

So I think it will be very interesting to see how the Git team fares with this problem, as they, too, have some wonderfully talented people at work on the problems.

No comments:

Post a Comment