5. The other service responds.
6. The application data cache (
curated by the back-end processing) is
7. Cached reference data is returned
to the service for use by its front-end app.
8. The response is issued to the caller.
SLAs and request depth. Requests
Pounding on the services
serviced by the SaaS front end will have
an SLA. A typical SLA may be a “300ms
response for 99.9% of the requests as-
suming a traffic rate of 500 requests
It is common practice when build-
ing services to measure an SLA with
a percentile (for example, 99.9%)
rather than an average. Averages are
much easier to engineer and deploy
but will lead to user dissatisfaction
because the outlying cases are typi-
cally very annoying.
at the Bottom
To implement a systemwide SLA with
a composite call graph, there is a lot
of pressure on the bottom of the stack.
Because the time is factored into the
caller’s SLA, deeper stacks mean more
pressure on the SLAs.
In many systems, the lowest-level
services (such as the session-state
manager and the reference-data caches) may have SLAs of 5ms–10ms 99.9%
of the time. Figure 4 shows how composite call graphs can get very complex
and put a lot of SLA pressure down the
slipping, one answer is to reduce the
utilization of the servers providing
the service. This can be done by adding more servers to the server pool
and spreading the work thinner.
Suppose each user-facing or externally facing service has an SLA. Also,
assume the system plumbing can
track the calling pattern and knows
which internal services are called by
the externally facing services. This
means that the plumbing can know
the SLA requirements of the nested internal services and track the demands
on the services deep in the stack.
Given the prioritized needs and the
SLAs of various externally facing services, the plumbing can increase the
number of servers allocated to important services and borrow or steal from
Accessing data and state. When a
request lands into a service, it initially
has no state other than what arrived
with the request. It can fetch the session state and/or cached reference data
The session state provides information from previous interactions that this
service had over the session. It is fetched
at the beginning of a request and then
figure 4. Composite call graphs.
stored back with additional information
as the request is completing.
Most SaaS applications use ap-plication-specific information that
is prepared in the background and
cached for use by the front end. Product catalog, price list, geographical information, sales quotas, and prescription drug interactions are examples of
reference data. Cached reference data
is accessed by key. Using the key, the
services within the front end can read
the data. From the front end, this data
is read only. The back-end portion of
the application generates changes
to (or new versions of) the reference
data. An example of read-only cached
reference data can be seen on the
Amazon.com retail site. Look at any
product page for the ASIN (Amazon
Standard Identification Number), a
10-character identifier usually beginning with “0” or “B.” This unique
identifier is the key for all the product
description you see displayed, including images.
Managing scalable and reliable
state. The session state is keyed by a
session-state ID. This ID comes in on
the request and is used to fetch the
state from the session-state manager.
a Quick Refresher on
simple Queuing theory
The expected response time is depen-
dent on both the minimum response
time (the response time on an empty
system) and the utilization of the sys-
tem. Indeed, the equation is:
= Minimum Response × Time 1 – Utilization
This makes intuitive sense. If the
system is 50% busy, then the work must
be done in the slack, so it takes twice
the minimum time. If the system is
90% busy, then the work must get done
in the 10% slack and takes 10 times the
very tight SLA Constraints
A composite call
graph in an SaaS
to meet a systemwide
SLA, each service deeper
in the call stack must
meet an ever tighter SLA.
front-end can get
the bottom of the call
stack can be under
enormous pressure to
meet tight SLAs.