Object-Relational
Mapping
Exposing
the
ORM
Cache
approach, however, can produce anomalous application behavior, unexpected results, or outright bugs. User
forums are littered with evidence of developers suffering
the consequences of such failures of understanding.
Caching can be one of the most technologically
advanced components of an ORM implementation, thus
representing a critical balance point for any application
that uses the implementation. Failure to acknowledge it
as a potential fulcrum may result in an application teetering or falling on the side of poor performance and incorrect semantics. In this article, therefore, we discuss topics
relevant to caching in ORM systems, and we expose some
of the details that implementations must be concerned
with and that application developers should be aware of.
OBJECTS AND IDENTITY
First and foremost developers must acknowledge the
nature of objects and how they are used in object-oriented languages. In practice, very rarely does an object
exist in isolation from other objects. An application
reference to an object is really an indirect reference to
an entire graph of objects rather than to a single solitary
object. The consequences of such a realization are far-reaching and form the basis for many of the difficulties
associated with caching in ORM.
When a read operation is performed, it must be
considered by runtime that the process may also fault
in objects referenced by the asked-for object. Of course,
this sequence may continue recursively, causing a whole
multitude of objects to be read from the database, each
individually requested as needed and in succession (a
phenomenon dubbed ripple loading2). Developers can prevent this from happening through one of many backstop
measures, such as declaring, either statically or dynamically, whether specific relationships should be traversed
and loaded. There are other approaches to avoiding
multiple successive trips to the database, but a discussion
of these is outside the scope of this article.
An object graph, by definition, implies that there may
be multiple paths leading to the same object. In some
cases these multiple relationships may be from a single
object, but in most cases they are from different objects.
In the course of loading the object graph, these relationships must end up pointing to the same identical object,
not two distinct memory imprints that happen to have
the same state. Failure to maintain object identity will
lead to the persistent state of the object being duplicated
in multiple instances, each one containing a point-in-time view of the entity state. This will inevitably lead to
inconsistent state and incorrect program behavior.
Maintaining the identity of objects in a graph means
that the loader must keep track of each object and its
identity. The nature of the solution meshes neatly with
the job that a cache already has to do, so it is not surprising that the task is often relegated to the cache.
CACHING LEVELS
An application manages different visibility scopes during
its execution. For single-user scopes an isolated cache
is appropriate, but for global contexts a shared cache,
sometimes referred to as an L2 (level 2) cache, provides
the level of caching that offers the same state to all
requesters. Each of these is unique to its purpose and may
function or perform slightly differently from the other.
There may even be duplication of state spanning the two
caches, particularly in light of isolation requirements.
TRANSACTIONAL CACHE
Transactions clearly play a major role in any system,
including the cache. In fact, the transactional cache is
purposed especially for the transaction, and its inhabitants are strictly transactional objects. Being associated
with the transaction implies that the cache exports
the correct isolation and consistency of its objects (the
“correct” isolation is described in more detail in a later
section). Assumptions about the type of transaction are
particularly relevant because of the differences among
them. Some are thread-bound, while others allow multithreading; some are tied to a single database connection,
while others may access multiple resources.
The presence of an object in the transactional cache
means, by definition, that it is transactional. There is an
if-and-only-if relationship between the two, such that
when a transactional object is modified, its modified state
must be reflected within the transactional cache. Furthermore, the state of the transactional cache represents the
total change summary of the transaction from the ORM
perspective and must of necessity follow the life cycle