ed, they created a system that, under
the hood, was updating the YAML
files in their VCS.
GitOps lowers the cost of creating
self-service IT systems, enabling self-service operations where previously
they could not be justified. It improves
the ability to operate the system safely,
permitting regular users to make big
changes. Safety improves as more tests
are added. Security audits become easier as every change is tracked.
GitOps isn’t perfect. Not every IT
system or service can be configured
this way, and it requires a commitment to writing user-facing documentation that may take some getting
used to. It also requires higher security controls around your VCS, as it is
now a production attack vector.
When GitOps becomes embedded
in the technical culture of the company, however, new systems are built with
GitOps in mind, and we move one step
closer to a world where operations are
collaborative, shared, and safe.
This article benefited from feedback
and suggestions from Alice Goldfuss,
SRE, GitHub Inc.; Elizabeth K. Joseph,
developer advocate, Mesosphere;
Chris Hunt, SRE, Stack Overflow Inc.;
Eric Shamow, lead platform engineer,
StanCorp; Jonathan Kratter, SRE, Uber
Technologies LLC; and Mark Henderson, SRE, Stack Overflow Inc.
Are You Load Balancing Wrong?
Thomas A. Limoncelli
Making Sense of Revision-Control Systems
Containers Will Not Fix Your Broken Culture
Thomas A. Limoncelli is the SRE manager at Stack
Overflow Inc. in New York City. His books include The
Practice of System and Network Administration, The
Practice of Cloud System Administration, and Time
Management for System Administrators. He blogs at
EverythingSysadmin.com and tweets @Yes That Tom
Copyright held by owner/author.
Publication rights licensed to ACM. $15.00.
tems to BCC me on any PR updates.
As a result, I have become omnipresent and rarely have to nag for status
updates. When I do find myself mi-cromanaging a project, it is usually
for a project that has not subscribed
to the GitOps way of doing things.
Another GitOps management trick
that is helpful when dealing with low
performers: reviewing the person’s
history of PRs can be enlightening
and used as evidence to show the employee when you are coaching him or
her on specific problems.
The biggest benefits, however, are derived from GitOps’ ability to improve
the ROI for automation. The truth is
that we do not have time to automate
all requests, and not all requests have
sufficient ROI to make automation
worthwhile. GitOps reduces the I, making it easier to achieve the R.
Traditionally, self-service IT systems involve creating a Web-based
portal that permits users to perform
well-defined, transactional requests
without the involvement of a human
approver. Such systems are difficult
to create, however, requiring Web UI
design skills that are often beyond
those of a typical system administrator. The workflow is sufficiently
complex that advanced user experience (UX) research and testing would
be required to create a system that is
less confusing than just opening a
ticket. Sadly, most companies do not
have UX research staff, and those that
do will not allocate them to internal
Such systems are brittle, as they
are tightly coupled to the system they
control. I have seen portals that were
abandoned when the underlying system changed because less effort was
required to return to manual updates
than to update the portal. This is often
not because the change was complex,
but because the knowledge of how to
update the portal left with the person
who originally created it. Often portals are crafted by people different
from those responsible for the systems
GitOps lowers the bar for creating
self-service systems since the UI is the
existing PR system that the company
My advice for getting started is to use
the existing VCS, PR, and CI systems
in place at your organization. People are familiar with them already,
which reduces the learning curve.
They often have many nice features
such as a way to manage the queue of
PRs waiting for approval, integration
with ticket systems, and so on. I’m
fond of systems that can announce
the arrival of new PRs in my team’s
chat room. You can even make PR approvals as easy as sending a message
to the chatbot.
Look for opportunities to use GitOps.
DNS is an obvious place to start, as are
VM creation, container maintenance
and orchestration, firewall rules, website updates, blog posts, email aliases
and mailing lists, and just about any
virtual infrastructure or one with a configuration file or API.
GitOps is pervasive in our industry.
At OpenStack the entire infrastructure
is controlled via GitOps, including the
GitOps infrastructure itself. Employees at GitHub Inc. report the use of GitOps is pervasive, and even nontechnical employees are trained on how to
The popular open source dashboarding application Grafana is
moving toward a GitOps workflow for
dashboards. Recognizing how much
business logic is encoded in these
dashboards, Grafana now offers
the option to provision dashboards
from JSON dashboard definitions
dropped into a specific location on
the file system. With a GitOps approach, dashboard creators can use
whatever tooling and process makes
sense for their particular workflow—
including the use of PRs on the Git
repo storing the dashboards.
One company maintained an inventory of where equipment was positioned in computer racks around the
globe as a set of YAML (YAML Ain’t
Markup Language) files. Technicians
kept the files up to date via PRs that
triggered automatic calculations to
detect overloaded power systems and
validate other system constraints.
The CI system also generated HTML
pages with diagrams of the racks.
When a Web-based GUI was request-