Public Notes by scott Tagged #pub

Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore - LinkedIn arxiv.org

Paxos Made Live - An Engineering Perspective www.cs.utexas.edu

Abstract We describe our experience in building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the ﬁeld, building such a database proved to be non-trivial. We describe selected algorithmic and engineering problems encountered, and the solutions we found for them. Our measurements indicate that we have built a competitive system. #distributed-systems #paxos #toread #pub

The reactive manifesto www.reactivemanifesto.org

A Survey of Rollback-Recovery Protocols in Message-Passing Systems www.cs.utexas.edu

PacificA: Replication in Log-Based Distributed Storage Systems - Microsoft Research research.microsoft.com

Large-scale distributed storage systems have gained popularity for storing and processing ever increasing amount of data. Replication mechanisms are often key to achieving high availability and high throughput in such systems. Research on fundamental problems such as consensus has laid out a solid foundation for replication protocols. Yet, both the architectural design and engineering issues of practical replication mechanisms remain an art. This paper describes our experience in designing and implementing replication for commonly used log-based storage systems. We advocate a general replication framework that is simple, practical, and strongly consistent. We show that the framework is flexible enough to accommodate a variety of different design choices that we explore. Using a prototype system called PacificA, we implemented three different replication strategies, all using the same replication framework. The paper reports detailed performance evaluation results, especially on system behavior during failure, reconciliation, and recovery. #distributed-systems #toread #pub

The Primary-Backup Approach citeseerx.ist.psu.edu

Espresso: LinkedIn's Distributed Data Serving Platform (Paper) www.slideshare.net

Zab: High-performance broadcast for primary-backup systems www.stanford.edu

In Search of an Understandable Consensus Algorithm ramcloud.stanford.edu

Raft lecture (Raft user study) - YouTube www.youtube.com

Vive la Diff'erence: Paxos vs. Viewstamped Replication vs. Zab arxiv.org

High-Availability Algorithms for Distributed Stream Processing cs.brown.edu

Discretized Streams: An Efﬁcient and Fault-Tolerant Model for Stream Processing on Large Clusters www.cs.berkeley.edu

MillWheel: Fault-Tolerant Stream Processing at Internet Scale research.google.com

Abstract: MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google. #distributed-systems #stream-processing #toread #pub

Naiad: A Timely Dataflow System - Microsoft Research research.microsoft.com

Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have relied on multiple platforms, at the expense of efficiency, maintainability, and simplicity. Naiad resolves the complexities of combining these features in one framework. A new computational model, timely dataflow, underlies Naiad and captures opportunities for parallelism across a wide class of algorithms. This model enriches dataflow computation with timestamps that represent logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism. We show that many powerful high-level programming models can be built on Naiad’s low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Naiad outperforms specialized systems in their target application domains, and its unique features enable the development of new high-performance applications. #distributed-systems #stream-processing #toread #pub

linkedin/databus github.com

Source-agnostic distributed change data capture system. #distributed-systems #stream-processing #toread #pub

Apache BookKeeper - BookKeeper Home zookeeper.apache.org

The Apache BookKeeper subproject of ZooKeeper is made up of a distributed logging service called BookKeeper and a distributed publish/subscribe system built on top of BookKeeper called Hedwig. #distributed-systems #logging #pub