Engineers (SREs), and the fleet security managers—were identified and
surveyed to determine their typical
workflows. Using this information,
the migration team wrote implemen-tation-agnostic user journeys. To perform effective gap analysis, and to
reduce bias during the design phase,
the team made a conscious effort to
describe user journeys in a purely functional fashion.
Production milestone definition.
Based on the survey responses and
usage patterns collected, the team
grouped customer user journeys (by
both technology area and user type)
and prioritized them into bands of
features labeled alpha, beta, and general availability.
Workstream definition. In parallel to milestone definition, the team
grouped requirements into seven
streams of related work such as networking and provisioning. Each workstream was assigned a technical lead,
a project lead, and a skeleton staff.
Each team was virtual, recruited from
across reporting lines as needed to
address the work domain. The flexibility provided by this form of organization and associated matrix management turned out to be essential as the
Engineering prototyping gap analysis, and design proposals. Once
formed, each workstream examined
the critical user journeys in their domains and researched the feasibility
of implementing these stories on GCP.
To do so, the team performed a gap
analysis for each user journey by reading product GCP documentation and
running “fail-fast” prototyping sprints.
Throughout this process, possible
implementations were collected and
rated according to complexity, feasibility, and (most importantly) how easily
a customer external to Google could
implement this solution.
Whenever the migration team ar-
rived at a “Google-only” solution, it
filed a feature request to the GCP team
requesting a solution that would work
for customers outside of Google as
well, especially if another enterprise
customer would be interested in such
functionality. In this way, the team
sought to “act like a customer” in an
effort to make the platform enterprise-
ready. Where the GCP product teams
could not deliver a feature in time for
a release milestone, they implement-
ed bridging solutions that favored
solutions the public could use (for ex-
ample, Forseti Security) above Google-
Workstream work breakdown and
staffing. With design proposals in
place and implementation directions
decided, the team created detailed
work plans for each workstream in a
central project management tool. The
work was organized by customer user
journey, and tasks were broken down
by objective, key results, and quarter.
Drilling down to this level of detail provided enough information to estimate
the staffing required for each workstream, to understand interdependen-cies between streams, and to fine-tune
the organization as needed.
Technical Implementation Details
Once planning was complete, the team
was ready to begin implementing the
technical details of the migration.
This section describes the three main
buckets of work.
Background: Networking and BeyondCorp. Many of the networking
challenges of running a desktop service
on Google Compute Engine (GCE) were
at least partially solved by the BeyondCorp program (https://cloud.google.
com/beyondcorp/). In a BeyondCorp
model, access controls are based on
known information about a given device and user rather than on the location in a privileged network. When
network trust no longer factors into
access-control decisions, many services become readily accessible from
outside of the corporate network—
usually via a corporate laptop, but now
also from appropriately managed and
inventoried hosts on GCE.
Enterprises that leverage traditional
virtual private networks (VPNs) for remote access to applications will have a
different networking experience when
moving desktops or other services. A
typical strategy is to set up a Cloud VPN
vpn/overview) in a cloud project and
peer with on-premises equipment to
bridge the networks together.
Host authentication and authori-
zation. Device authentication is usu-
ally performed using client certificates
deployed on the host. When a user
receives a physical machine (or even
a virtual machine on privileged corpo-
rate networks), the user can initially log
in and request a certificate because the
corporate network retains the level of
privilege needed to sync login policies.
Extending this level of network privi-
lege to public cloud IP ranges is unde-
sirable for security reasons.
To bridge this gap, Google developed a (now-public) API to verify the
identity of instances (https://cloud.
verifying-instance-identity). The API
uses JWTs (JSON Web tokens; JSON is
instance belongs to a preauthorized
Google Cloud project. When an instance first boots, one of the JWTs provided by that API can be exchanged for
a client certificate used to prove device
identity, unblocking nearly all of the
normal communication paths (
including syncing login policies for user authorization).
Once the client certificate is in
place, Google applications can be accessed via the BeyondCorp/identity-aware proxy as if they were any other
Internet-facing service. In order for
cloud desktops to reach the proxies
(and other Internet endpoints), the
team set up network address translation (NAT) gateways (https://cloud.
google.com/compute/docs/vpc/special-configurations) to forward traffic
from the instances to targets outside
of the cloud project. In combination,
these approaches allow users to access
internal resources and the public Internet seamlessly, without requiring each
instance to have a publicly routed
Provisioning. The first step in designing a provisioning scheme was to
map out everything necessary to deliver
an end product that met users’ needs.
Compute Engine instances needed
levels of trust, security, manageability,
and performance for users to perform
their jobs—developing, testing, building, and releasing code—as normal.
Working from these requirements, the
team used the following specific principles to guide the rest of the design.
Users should interact with a cloud
desktop similarly to how they interact
with hosts on the corporate network.
Users should be able to use their