merely to point out the potential problem with too much focus on functionality at the expense of a solid architectural foundation.) Because it looked
like performance and scalability were
going to be major concerns, the architecture team began working on some
model components and drivers to investigate the design.
We did some research around the
incoming rate of messages and the
mix in the types of transactions. We
also sampled timings from the functional “processors” that had already
been built. Then using the same messaging infrastructure as the existing
dispatcher, we built a component that
would simulate the incoming message
dispatcher. Some of the messaging
technology was new to the company.
At one end of the dispatcher we had
drivers to simulate inbound messages.
On the other end we simulated the performance of the functional processors
(FPs) using pseudo-random numbers
clustered around the sampled timings. By design, there was nothing in
the modeled components or drivers
related to the functional processing in
the system.
Once the model was fully functional, we were able to play with various
parameters related to the incoming
message rates and simulated FP timings. We then began to weight the FP
times according to processing cost
variations in the mix of incoming
message types. Prior to this modeling
effort, the design had (wrongly) assumed that the most important performance aspect was the latency of the
individual transactions. Several seconds of latency was acceptable to all
concerned. After all, it would be quite
some time before this slave would become the system of record and drive
transactions the other way.
The modeling results were not encouraging. The latency was going to be
a challenge, but the overall throughput
requirements were going to bury the
system. We started exploring ways to
address the performance problems.
The system was already targeted for the
fastest hardware available for the chosen platform, so that option was out.
We delayed looking into improving the
performance of the individual functional processors; that was deemed
to be more costly because of the num-
What can be
done about
the challenge to
understand—
or better yet,
prevent—the
complexity in
systems?
ber that had already been written. We
thought our chances of quick success
could increase with a focus on the common infrastructure pieces.
We worked on new dispatching
algorithms but that did not result in
enough improvement. We looked at
optimizing the messaging infrastructure but still fell short. We then began
to benchmark some other message
formats and infrastructures, and the
results were mildly encouraging. We
examined the existing programs to see
how easy it was going to be to alter the
messaging formats and technology.
The programs were too dependent on
the message structure for it to be altered within a reasonable timeframe.
Given the still-poor results, we
needed to examine the functional algorithms and the database access.
We took a few of the midrange and
lengthier running processors and inserted some logging to obtain split
times of the various steps. Many of the
functional algorithms were relatively
expensive because of the required
complexity for the mapping and restructuring of the data. The database
operations seemed to take longer than
we logically thought they should. (Over
time an architect should develop a
sense for a performance budget based
on an abstract view of similar functionality where he or she had previously maximized performance.)
We then examined the logical database model. The design was not a pattern that would be performant for the
types of programs in the system. The
SQL from a few of the algorithms was
extracted and placed in stand-alone
model components. The idea was to
see which types of performance increases were possible. Some increases
came from changing some of the SQL
statements, which were taking excessive time because the chosen partitioning scheme meant that reading core
tables typically involved scanning all
partitions. As our simulated database
size grew, this became punitive to scalability. The primary problem, however,
was not the extended length of time for
individual statements but the sheer
number of calls. This was a result of
taking normalization too far. There
were numerous tables with indexes
on columns that changed frequently.
Additionally, multicolumn keys were