Description:

Maintaining a distributed system requires constant maintenance to ensure failures don't interfere with that system's reliability and availability. Using periodic scheduling and replication, site reliability engineers can minimize the effect failures may have on a system's performance. One way to automate this process is to utilize the system daemon, cron.

In this course, you'll explore how to use cron for task scheduling, the purpose, components, and operators involved in cron jobs, and the format and characters of cron syntax. You'll outline how cron works with distributed periodic scheduling and idempotency, and in largescale deployments.

Next, you'll review the PAXOS distributed consensus algorithm, best practices for its use, and how it applies to distributed replication. Lastly, you'll practice scheduling a cron job and using cron syntax generators.

Target Audience:

Duration: 00:58

Description:

Anticipating failures that will affect your company's systems is a crucial site reliability engineer duty. These failures are especially significant when they affect distributed systems, which is why efficient algorithms and strategies are essential in minimizing the likelihood of failures.

In this course, you'll explore both critical state management and the CAP theorem, identifying how both concepts relate to distributed systems. Next, you'll examine several distributed system management algorithms and strategies, including deterministic and nondeterministic algorithms, distributed system models, and Byzantine faults. You'll then outline how each of these benefits distributed system management.

Finally, you'll investigate the Multi-Paxos message flow protocol and how it works with distributed systems. Finally, you'll describe what's involved in deploying and monitoring a consensus-based system to increase distributed system performance.

Target Audience:

Duration: 01:14