On Your Path To Site Reliability Engineering

Cristiano Cunha
Tuesday, 21st June 2022

Learning Outcomes

  • Identify approaches for adopting site reliability engineering
  • Description

    Site Reliability Engineer by definition is an engineering approach to IT, these engineers are development-focused engineers who solve operational/scale/reliability problems. Knowing that SREs are vital to supporting the DevOps change and being an SDET, how can you apply what you already know from your engineering approach to testing that can be applied to this scenario? Using the context of your companies, sit down and think about what it could mean to make such a change in your context.

    Instructions:

    Define context

    Sit down with your teammates (or alone) and describe the context of the company that would benefit from the creation of an SRE team, use the aspects of your company to bring some realism to this activity (or be creative and include problems you would like to have discussions over them and solutions suggested).

    If you prefer you can use the following example:

    “In this company, you have an infrastructure/operations team that is the one responsible for everything happening in infrastructure and in production. This team is being drowned by tickets and resolving issues using manual actions. They use some scripts but spread in diverse machines with no versioning. They also do on call and support production 24x7.“

    Generate plan

    Reflect on the situation described and discuss it with your teammates. What different approaches will you take to implement such change? Define 3 to 5 points that you and your team think are the most important to be addressed (explain how to implement it and what is the outcome that you expect for each point).

    Starting to use Source Control tool

    • How
      • Online training on source control.
      • Sharing sessions on how we can save scripts in source control.
      • Make sure all scripts are now “downloaded” from source control and contribution is done in it (No more local scripts).
      • Ensemble programming for everyone to see how it should be used. 
    • Outcome
      • No more scripts in local machines.
      • Code starts to flow into the source control tool and a process starts to be designed for sharing and contribution.

    Share

    Choose 2-3 volunteers to describe their context and their plan to make the change and open a Q & A to discuss the approach.

    Wrap-up:

    Understanding what an SRE is and the set of responsibilities they are accounted for will help in the decision of considering a role in this area or not. You will also have an overview of the possible challenges to expect when doing such a change and possible solutions to try (or adapt) in order to invest in the change while moving forward.

    Resources