Public Notes
on
histre
Principles of Distributed Computing - DISCO
disco.ethz.ch
The Part-Time Parliament - Leslie Lamport
research.microsoft.com
Viewstamped Replication Revisited - Barbara Liskov and James Cowling
pmg.csail.mit.edu
Abstract
This paper presents an updated version of Viewstamped
Replication, a replication technique that handles failures
in which nodes crash. It describes how client requests are
handled, how the group reorganizes when a replica fails,
and how a failed replica is able to rejoin the group. The
paper also describes a number of important optimizations
and presents a protocol for handling reconfigurations that
can change both the group membership and the number
of failures the group is able to handle.
#paxos #distributed-systems #toread #pub
Show More
Paxos Made Live - An Engineering Perspective
www.cs.utexas.edu
Abstract
We describe our experience in building a fault-tolerant data-base using the Paxos consensus algorithm.
Despite the existing literature in the field, building such a database proved to be non-trivial. We describe
selected algorithmic and engineering problems encountered, and the solutions we found for them. Our
measurements indicate that we have built a competitive system.
#distributed-systems #paxos #toread #pub
Show More
The reactive manifesto
www.reactivemanifesto.org
A Survey of Rollback-Recovery Protocols in Message-Passing Systems
www.cs.utexas.edu
PacificA: Replication in Log-Based Distributed Storage Systems - Microsoft Research
research.microsoft.com
Large-scale distributed storage systems have gained popularity for storing and processing ever increasing amount of data. Replication mechanisms are often key to achieving high availability and high throughput in such systems. Research on fundamental problems such as consensus has laid out a solid foundation for replication protocols. Yet, both the architectural design and engineering issues of practical replication mechanisms remain an art. This paper describes our experience in designing and implementing replication for commonly used log-based storage systems. We advocate a general replication framework that is simple, practical, and strongly consistent. We show that the framework is flexible enough to accommodate a variety of different design choices that we explore. Using a prototype system called PacificA, we implemented three different replication strategies, all using the same replication framework. The paper reports detailed performance evaluation results, especially on system behavior during failure, reconciliation, and recovery.
#distributed-systems #toread #pub
Show More
The Primary-Backup Approach
citeseerx.ist.psu.edu
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
www.slideshare.net
Zab: High-performance broadcast for primary-backup systems
www.stanford.edu
In Search of an Understandable Consensus Algorithm
ramcloud.stanford.edu
Raft lecture (Raft user study) - YouTube
www.youtube.com
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
research.google.com
Abstract: MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.
#distributed-systems #stream-processing #toread #pub
Show More
Naiad: A Timely Dataflow System - Microsoft Research
research.microsoft.com
Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have relied on multiple platforms, at the expense of efficiency, maintainability, and simplicity. Naiad resolves the complexities of combining these features in one framework.
A new computational model, timely dataflow, underlies Naiad and captures opportunities for parallelism across a wide class of algorithms. This model enriches dataflow computation with timestamps that represent logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism.
We show that many powerful high-level programming models can be built on Naiad’s low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Naiad outperforms specialized systems in their target application domains, and its unique features enable the development of new high-performance applications.
#distributed-systems #stream-processing #toread #pub
Show More
linkedin/databus
github.com
Source-agnostic distributed change data capture system.
#distributed-systems #stream-processing #toread #pub
Show More
Apache BookKeeper - BookKeeper Home
zookeeper.apache.org
The Apache BookKeeper subproject of ZooKeeper is made up of a distributed logging service called BookKeeper and a distributed publish/subscribe system built on top of BookKeeper called Hedwig.
#distributed-systems #logging #pub
Show More
HedWig - Apache BookKeeper - Apache Software Foundation
cwiki.apache.org
Streaming MapReduce with Summingbird | Twitter Blogs
blog.twitter.com
Samza
samza.incubator.apache.org
www.cs.berkeley.edu/~brewer/cs262/Aries.pdf
www.cs.berkeley.edu
Amazon Dynamo Paper
www.allthingsdistributed.com
Collect and share the web
Get started for free