systems with a large number of processors has fundamentally changed the
scenario. Given the large number of
resources available, threads no longer
compete just for processor time, but
also for shared hardware resources.
This scheduling model fails to recognize the sharing aspects of today’s
processors, allowing for some performance anomalies that are sometimes
difficult to address. Consider, for example, a high-priority thread competing over a specific resource with a set
of “hungry” lower-priority threads.
In this case, it would be desirable to
extend the implementation of priorities to include priority over shared resources. The operating system could
then choose to move the lower-priority
threads away from where the higher-priority one is running or to find a more
appropriate place for it to execute with
less-contended resources.
This extension presents a new
method through which developers
and system administrators can specify
which components of an application
should be more or less provisioned. It
is a dynamic, unobtrusive mechanism
that provides the necessary information for the operating system to provision threads more effectively, reducing
contention over shared resources and
taking advantage of the new hardware
features discussed previously. Furthermore, the new behavior is likely to benefit users who already identify threads
in their applications with different levels of importance (an important aspect
of this work, for practical reasons).
Additionally, several other aspects
of priorities play to our advantage.
Since the proposed “spatial” semantics will determine how many resources will be assigned to threads, it is critical that this mechanism is restricted to
users with the appropriate privileges—
already a standard aspect of priorities
in all Unix operating systems. Priorities can also be applied at different levels: at the process, thread, or function
level, allowing optimizations at a very
fine granularity.
Load Balancing and Priorities
Load balancing is perhaps one of the
most classic concepts in scheduling.
Modulo implementation details, the
basic idea is to equalize work across execution units in an attempt to have an
the traditional
implementation of
load balancing does
not perform well
in heterogeneous
scenarios unless
the scheduler
is capable of
identifying
the different
requirements of
each thread.
even distribution of utilization across
the system. This basic assumption is
correct, but the traditional implementation of load balancing does not perform well in heterogeneous scenarios
unless the scheduler is capable of
identifying the different requirements
of each thread and the importance of
each thread within the application.
A few years ago the Solaris scheduler was extended to implement load
balancing across shared hardware
components in an effort to reduce resource contention. We had discovered
that simply spreading the load across
all logical CPUs was not enough: it
was also necessary to load-balance
across groups of processors that share
performance-relevant components.
To implement this policy, Solaris established the Processor Group abstraction. It identifies and represents
shared resources in a hierarchical
fashion, with groups that represent
the most-shared components (pipe to
memory, for example) at the top and
groups that represent the least-shared
ones at the bottom (such as execution
pipeline). The accompanying figure
illustrates the processor group topology for two different processors: the
SPARC T4 and Intel Xeon processors,
with each hardware component and
the CPUs they contain.
Each processor group maintains
a measure of its capacity and utilization, defined as the number of CPUs
and running threads in a particular
group. This information is then incorporated by the scheduler and used
when deciding where to place a software thread, favoring groups where the
utilization:capacity ratio would allow it
to make the most progress.
Performance-Critical threads
The processor group abstraction and
the associated load-balancing mechanism for multicore, multithreaded
processors successfully reduced contention at each level of the topology by
spreading the load equally among the
various components in the system. That
alone, however, did not account for the
different characteristics and resource
requirements of each thread in a heterogeneous application or workload.
To address this issue Solaris recently
extended its load-balancing mechanism so that a thread’s notion of utiliza-