Description:

Once SRE metrics have been identified, site reliability engineers (SREs) must know how to perform fault analysis on a system, classify defects, and monitor and report data. In this course, you'll explore the tools and best practices for carrying out these procedures.

You'll begin by identifying various fault analysis methods and tools. You'll then classify software defects and bugs with a focus on severity and priority.

Next, you'll investigate strategies for monitoring APIs and explore some tools used for this task. You'll then examine in detail several tools for collecting, analyzing, and reporting metric data using a customizable dashboard, including those that comprise the ELK Stack - Elasticsearch, Logstash, and Kibana. Furthermore, you'll explore the data collection tool Beats and the beneficial use cases for Elasticsearch notifications.

Target Audience:

Duration: 01:19

Description:

To improve the chances of creating, monitoring, and maintaining a successful software development project, site reliability engineers and all team members must be aware of which metrics to measure. They also need a working knowledge of both automated and manual testing methods. In this course, you’ll learn how to manage and select SRE metrics and how various testing methods work.

You’ll begin by learning what metrics need to be measured for project management, software development, and APIs - examining in detail CI/CD, cloud API, and software project metrics, to name a few. Next, you’ll compare both manual and automated testing methods and the goals of each.

Lastly, you’ll investigate automated testing frameworks and platforms, test cases and types, and best practices and pitfalls to consider.

Target Audience:

Duration: 01:25