LADC Tutorials 2022

Agreement in Distributed Systems (Prof. Achour Mostefaoui)

In sequential computing, the notion of universality is represented by a Turing machine capable of mechanically computing anything that is computable. Read/write registers, the basic objects of a Turing machine, are thus universal objects in sequential computing. In the context of distributed systems, each process has only a partial knowledge of the many parameters involved in the problem to be solved. From an operational point of view this means that the processes of a distributed computation need to exchange information, and agree in some way or another, in order to cooperate to a common goal. We know, since 1985 and the famous FLP impossibility result, that the consensus problem has no deterministic solution in an asynchronous message-passing distributed system where even one process might fail by crashing (extended to shared-memory systems). This impossibility is not due to the computing power of the individual processes, but rather to the difficulty of coordination between the different processes that compose the distributed system. Coordination and agreement problems are thus at the heart of computability in distributed systems. Said another way, a central issue of distributed computing consists in coping with the uncertainty created by asynchrony and failures. The agreement power of objects is at the heart of computability in distributed systems. Consensus is proved universal in the sense that, any object having a sequential specification has a wait-free implementation using only read/write registers and consensus objects. Moreover, some special instructions, such as CAS, are universal as well. In message-passing systems, one needs to either assume strong timing properties or to design non-deterministic solutions.