Fundamentals of fault-tolerant distributed computing in asynchronous environments 论文

1999ACM Computing Surveys引用 347

Distributed systems and fault toleranceOptimization and Search ProblemsParallel Computing and Optimization Techniques

Parallel Computing and Optimization Techniques Distributed systems and fault tolerance Optimization and Search Problems

作者

摘要

Fault tolerance in distributed computing is a wide area with a significant body of literature that is vastly diverse in methodology and terminology. This paper aims at structuring the area and thus guiding readers into this interesting field. We use a formal approach to define important terms like fault, fault tolerance , and redundancy . This leads to four distinct forms of fault tolerance and to two main phases in achieving them: detection and correction . We show that this can help to reveal inherently fundamental structures that contribute to understanding and unifying methods and terminology. By doing this, we survey many existing methodologies and discuss their relations. The underlying system model is the close-to-reality asynchronous message-passing model of distributed computing.

作者查看全部 (1)

Felix Gärtner

Fundamentals of fault-tolerant distributed computing in asynchronous environments 论文

摘要

作者查看全部 (1)

相关技术查看全部 (2)

相关事件

相关文章