adopt, how to get started, and how to
get support. They also provide a consistent user experience and facilitate
product adoption.
An About page helps SREs and product development engineers understand what the product or tool is, what
it does, and whether they should use it.
A concepts guide or glossary defines
all the terms unique to the product.
Defining terms helps maintain consistency in the docs and UI, API, or CLI
(command-line interface) elements.
The goal of a quickstart guide is to
get engineers up and running with a
minimum of delay. It is helpful to new
users who want to give the product a try.
Codelabs. Engineers can use these
tutorials—combining explanation,
example code, and code exercises—
to get up to speed with the product.
Codelabs can also provide in-depth
scenarios that walk engineers step
by step through a series of key tasks.
These tutorials are typically longer
than quickstart guides. They can cover more than one product or tool if
they interact.
How-to guide. This type of document is for users who need to know
how to accomplish a specific goal with
the product. How-tos help users complete important specific tasks, and they
are generally procedure based.
The FAQ page answers common
questions, covers caveats that users
should be aware of, and points users to
reference documents and other pages
on the site for more information.
The support page identifies how
engineers can get help when they are
stuck on something. It also includes an
escalation flow, troubleshooting info,
groups links, dashboard and SLO, and
on-call information.
API reference. This guide provides
descriptions of functions, classes, and
methods, typically with minimal narrative or reader guidance. This documentation is usually generated from code
comments and sometimes written by
tech writers.
Developer guide. Engineers use this
guide to find out how to program to a
product’s APIs. Such guides are necessary when SREs create products that
expose APIs to developers, enabling
creation of composite tools that call
each other’s APIs to accomplish more
complex tasks.
Documents for
Reporting Service State
Here, we describe the documents that
SRE teams produce to communicate
the state of the services they support.
Quarterly service review. Information
about the state of the service comes in
two forms: A quarterly report reviewed
by the SRE lead and shared with the SRE
organization, and a presentation to the
product development lead and team.
The goal of a quarterly report (and
presentation) is to cover a “State of the
Service” review, including details about
performance, sustainability, risks, and
overall production health.
SRE leads are interested in quarterly
reports because they provide visibility
into the following:
˲ Burden of support (on-call, tickets,
postmortems). SRE leads know that when
the burden of support exceeds 50% of the
SRE team’s resources, they must respond
and change the priorities of their teams.
The goal is to give early warning if this
starts to trend in the wrong direction.
˲ Performance of the SLA. SRE leads
typically want to know if the SLA is being missed or if the ecosystem has an
unhealthy component that puts the
product clients in jeopardy.
˲ Risks. SRE leads want to know what
risks the SREs see to being able to deliver against the goals of the products
and the business.
Quarterly reports also provide opportunities for the SRE team to:
˲ Highlight the benefit SRE is providing to the product development team,
as well as the work of the SRE team.
˲ Request prioritization for resolving problems hindering the SRE team
(sustainability).
˲Request feedback on the SRE
team’s focus and priorities.
˲Highlight broader contributions
the team is making.
Production best practices review.
With this review SRE teams are better
able to adopt production best practices and get to a very stable state where
they spend little time on operations.
SRE teams prepare for these reviews
by providing details such as team
website and charter, on-call health
details, projects vs. interrupts, SLOs,
and capacity planning.
The best practices review helps
the SRE team calibrate itself against
the rest of the SRE organization and
Playbooks contain
instructions
for verification,
troubleshooting,
and escalation
for each alert
generated from
network-monitoring
processes.