cRoSBy: The key point is that you
don’t have the luxury of being asked
when a VM moves; you are told. The argument that Lin (Nease) makes is that
we would never move a thing to a LAN
segment that is not protected. People
usually don’t understand the infrastructure at that level of detail. When the IT
guy sees a load not being adequately serviced and sees spare capacity, the service gets moved so the load is adequately resourced. End of story: it will move to
the edge. You are not asked if the move
is OK, you are told about it after it happens. The challenge is to incorporate
the constraints that Lin mentions in the
automation logic that relates to how/
when/where workloads may execute.
This in turn requires substantial management change in IT processes.
TaVakoLi: You can have a proxy at the
edge that speaks to all of the functionality available in that segment.
cRoSBy: The last-hop switch is right
there on the server, and that’s the best
place to have all of those functions.
Moving current in-network functions to
the edge (such as, onto the server) gives
us a way to ensure that services are present on all servers, and when a VM executes on a particular server, its specific
policies can be enforced on that server.
caSaDo: This conversation gets
much clearer if we all realize there are
two networks here: a physical network
and one or more logical networks.
These networks are decoupled. If you
do service interposition, then you
have to do it at the logical level. Otherwise you are servicing the wrong network. Where you can interpose into
a logical network, at some point in
the future these services will become
distributed. When they become a service, they’re part of a logical network
and they can be logically sized, partitioned, and so on.
Today, services are tethered to a
physical box because of the sales cycle,
because someone comes in and says,
“I’ve got a very expensive box and I need
to have high margins to exist.” As soon
as you decouple these things, you have
to put them into the logical topology
or they don’t work. Once you do that,
you’re untethered.
neaSe: But once you distribute
them, you have to make sure that you
haven’t created 25 things to manage
instead of one.
caSaDo: You already have the model
of slicing, so you already have virtualization; thus, nothing changes in
complexity. You have the exact same
complexity model, the exact same
management model.
neaSe: No, if I can get problems from
more than one place, something has
changed. Think of virtual switching as
a distributed policy enforcement point.
It is not true, however, that distributed
stuff is equal in cost to centralized
stuff. If distributed stuff involves more
than one way that a problem could occur, then it will cost more.
caSaDo: It would have to be distributed on the box. If you’re going to inject it to one or more logical topologies,
then you will have the same amount of
complexity. You’ve got logically isolated components, which are in different
default domains.
If people want the dynamics and
cost structure of the cloud, they should
either not invest in anything now and
wait a little while; or invest in a scale-out commodity and make it look
like Amazon. If they do not take one
of these two paths, then they will be
locked into a vertically integrated stack
and the world will pass them by.
cRoSBy: The mandate to IT is to virtualize. It’s the only way you get back
the value inherent in Moore’s Law.
You’re buying a server that has incredible capacity—about 120 VMs per server—that includes a hypervisor-based
virtual switch. You typically have more
than one server, and that virtual switch
is the last-hop point that touches your
packets. The virtual switch allows systems administrators to be in charge
of an environment that can move
workloads on the fly from A to B and
requires the network to ensure that
packets show up where they can be
consumed by the right VMs.
neaSe: The people who will be left
out in the cold are the folks in IT who
have built their careers tuning switches. As the edge moves into the server
where enforcement is significantly
improved, there will be new interfaces
that we’ve not yet seen. It will not be a
world of discover, learn, and snoop; it
will be a world of know and cause.
cRoSBy: The challenge for networking vendors is to define their point
of presence at the edge. They need to
show what they are doing on that last-
hop switch and how they participate
in the value chain. Cisco, for example,
via its Nexus 1000V virtual switch, is
already staking a claim at the edge and
protecting its customers’ investments
in skill sets and management tools.
BeeLeR: If I manage the network
within an enterprise and I’m told we
just virtualized all our servers and are
going to be moving VMs around the
network to the best host platforms,
then as network manager, since I do
not have a virtualized network, this
causes me problems. How do I address
that? How do I take the IDS that I have
on my network today, not of the future,
and address this problem?
caSaDo: You either take advantage
of the dynamics of the cloud, which
means you can move it and you do scale
out, or you don’t. In this case you can’t
do these VM moves without breaking
the IDS. The technology simply dictates whether you can have a dynamic
infrastructure or not.
ReDDy: My server utilization is less
than 10%. That number is not just CPU
utilization—memory and I/O bandwidth are also limited because there
are only two network cards on the
each server box. All my applications
are very network-bandwidth intensive
and saturate the NICs (network interface cards). Moreover, we also make a
lot of I/O calls to disk to cache content.
Though I have eight cores on a box, I
can use only one, and that leaves seven
cores unutilized.
neaSe: It seems like you would benefit from an affinity architecture where
the communicating peers were in the
same socket, but that sometimes requires gutting the existing architecture
to pull off.
TaVakoLi: From our perspective,
you want a switch without those affinity characteristics so you don’t have to
worry about “same rack, same row” to
achieve latency targets. You really want
a huge switch with latency characteristics independent of row and rack.
ReDDy: It is more of a scheduling issue. It’s about getting all the data into
the right place and making best use
of the available resources. We need to
schedule a variety of resources: network, storage, computational, and
memory. No algorithms exist to optimally schedule these media to maximize utilization.