Sunday, April 13, 2014

Software Overlays: infinite scaling and pay as you grow cost model

Defendants of software overlays are often times upset when they are challenged on scalability compared to hardware based solutions. An example here: Scalability with NSX.

Honestly, I too think that the argument of software (running on x86) vs. ASICs isn't the most interesting one. However, I also think that to dismiss any argument by saying "you don't understand the architecture of software overlays" is a void reasoning. And yet it is what usually happens, and folk (like in the example above) resort to saying <my-overlay-of-preference>  is a distributed architecture and therefore, it scales.

The flaw in that reasoning is that you are considering ONE dimension of the problem: moving packets from A to B. But there are many other dimensions when it comes to any network system. There's building and maintaining network state, there's availability, there's managing latency and jitter, and many others.

I do not want to trash software overlay solutions that run on x86, by the way. I think the services they provide are better off delivered by the physical network which must exist anyway, but when it is not possible and you are in an environment with 100% virtualisation … software overlays are definitely to be considered. Even then, handling north-south traffic in and out of the overlay is a challenge that must not be overlooked. In the long run, an integrated solution will prevail in my opinion.

The key is "100%" virtualisation. Because when that is not possible, when there's going to be east-west traffic towards databases or other systems running bare metal (and many systems run and will run bare metal and/or not run on a hypervisor) overlays not only fall short, but also become increasingly expensive. Of course, when your business relies 80% on selling hypervisor licenses, your view of the world is somewhat different …

What software overlays don't really eliminate is upfront capital cost of building a network infrastructure. This is a fact.

They also do not fully provide a pay as you grow model. If you want to build an infrastructure with 100 physical hosts, you need at least 200 physical network ports (assuming redundant connections, not counting management, etc …). When you want to add another 100 physical hosts, you need another 200 physical network ports and to grow your network core/spine/whatever to accommodate for it. This is true whether you will run a software overlay using VXLAN or use plain VLANs or anything else (by the way, VLANs are still more than sufficient for many cases, and easily automated through any modern CMP, including OpenStack Neutron).

Adding or removing VMs to those 100 physical hosts is another story. If you choose to go for an overlay software model to provide connectivity on top of the physical network, and you choose to pay per VM instead of otherwise, … well that is a choice. Customers should do a TCO analysis and choose whatever they find most convenient, including support for multiple hypervisors, etc.

What you cannot do is to think that any vendor is providing you an infinite scale system (or near infinite scale system).

What you should not do either is to evaluate scalability across a single (simplified) dimension. No overlay system is fully distributed. Packet forwarding (a.k.a. data plane) may be distributed to the hypervisor, but control is centralised. Sure, vendors will say "control clusters are built on scale-out model" … but if that is the holy grail, ask yourself why you can't scale out as far as you want but instead you are limited to 5 servers in the cluster … maybe 7 … maybe … There must be some level of complexity when you can't just "throw out more servers and scale …".

Control and network state is more complex as you move up the stack. It is one thing for L2, another for L3, yet another when you add policy. There is no holy grail there … and if you believe you found it, you are wrong, you just haven't realised it yet.

There is only one real problem: scalability.

I urge any reader to think about that sentence, not just in the context of technology, but any other. Fighting world's poverty, for instance.

No comments:

Post a Comment