Overview

Site Reliability Engineering (SRE) Fundamentals™ certification is to impart, test and validate knowledge of SRE vocabulary, principles and practices.

 

Site Reliability Engineering (SRE) Fundamentals™ certification helps Engineers to understand the basic foundations of Site Reliability Engineering such as SLOs, monitoring, alerting, toil, Load, risk and simplifying Reliability

Exam Requirements

  • Attend a face-to-face or virtual course taught by a Certified Site Reliability Engineering (SRE) Fundamentals™ trainer
  •  
  •  Have 16 hours of live online or 16 hours of in-person training with Certified Site Reliability Engineering (SRE) Fundamentals™ trainer
  •  
  • After successfully completing the course, you will need to accept the License Agreement to take the 45 question Site Reliability Engineering (SRE) Fundamentals™ test
  •  
  • To pass the test, correctly answer 32 out of the 45 questions within the 60-minute time limit
  •  
  • Maintain your Site Reliability Engineering (SRE) Fundamentals™ certification by renewing your certification annually

Modules

Fundamentals Image

Module 1: Introducing SRE

  • DevOps
  • SRE
  • SRE Terminologies
  • Toil
  • Type of Toils
  • Module Quiz
  • Video: What is Site Reliability Engineering

Module 2: Service Level Objectives

  • Service Level Objectives
  • SLO Data Components and Metrics
  • Measuring and evaluating Service Level Objectives (SLOs)
  • Steps for measuring and evaluating SLOs
  • Service Level Objectives challenges
  • SLO best Practices
  • Use Case: Batch Scheduling Data Flow Graphs with SLO
  • Module Quiz

Module 3: Service Level Indicators

  • Service Level Indicators
  • SLIs vs. SLOs vs. SLAs
  • Identifying SLI
  • Define Programmatic SLIs
  • Video: SLIs, SLOs, SLAs
  • Module Quiz

 

Module 4: Error Budgets

  • What is an error budget?
  • Why do you need an error budget?
  • Benefits of error budgeting
  • Error Budget Policies
  • Positive Error budget
  • Case Study: SRE Practice
  • Module Quiz

Module 5: Reduce Toil

  • What is operations toil?
  • Why Toil Matters
  • Why toil has to be less
  • How to Calculate TOIL
  • Strategies for reducing operations toil
  • Use Case: Reducing Toil From Alerting
  • Module Quiz

Module 6: Chaos Engineering

  • Chaos Engineering
  • Need for Chaos Engineering
  • Benefits of Chaos Engineering
  • Chaos Engineering and Testing
  • Chaos Engineering and DevOps
  • How Chaos Engineering works
  • Chaos Engineering Experiments
  • What is Chaos Monkey
  • Use Cases: Chaos Engineering
  • Video: Chaos Engineering
  • Module Quiz

 

Module 7: Managing Risk

  • Risk Management
  • Unplanned Downtime
  • Identify Risk in Services
  • Use Case: Kubernetes Common Failure Modes
  • Module Quiz

 

Target Audience

  • Anyone starting or leading a move towards increased reliability

  • Anyone interested in modern IT leadership and organizational change approaches

  • Business Managers

  • Business Stakeholders

  • Change Agents

  • Consultants

  • DevOps Practitioners

  • IT Directors

  • IT Managers

  • IT Team Leaders

  • Product Owners

  • Scrum Masters

  • Software Engineers

  • Site Reliability Engineers

  • System Integrators

  • Tool Providers