Sunday, April 6, 2014

IP Networks & Operating Systems - Virtualization is a means to an end, not the goal

I have recently seen a slide where it was stated that decoupling networking hardware and software is a must for achieving network virtualisation. This would also enable hardware independence, and provide the missing tool for the holy grail: the software defined data center.

I suppose that the best thought leadership is all about making people think that the (potential) means to an end, are really the objective itself. For instance, when someone says "I need a coke", when in fact they are just thirsty, and  they may like a coke to alleviate their thirst. Similarly, while the goal is IT automation, a possible means to that end is a software defined data center, and implementing a software defined data center by using virtualization of everything-on-x86 is only an option for that in itself. In this sense, many are evangelising that you need virtualisation, and because you need virtualisation you need network virtualisation and therefore you need to be able to run virtual networks the way you run virtual servers. 

I don't argue against the benefits of server virtualisation, or the need for network virtualisation either for that matter, in certain environments. I just think it is interesting how many of the marketing messages are creating this perception that virtualisation is a goal in itself. Much in the same way that SDN messaging has been distorted in the last few years, where it no longer is about separating control and data plane and opening both of them, but rather about running both in software (even if tightly integrated) … But that is topic for another post.

Why Server Virtualization was a Necessity 

I believe server virtualisation solved a problem that had been created by poorly designed operating systems and applications, which could not fully leverage the compute capacity they had available. The x86 architecture was also not good for providing isolation to higher layers. In the end, you had a physical server running one OS with a App stack on top which was not capable of making use of the full compute capacity. Servers were under utilised for that reason.Therefore hypervisors solved a deficiency of operating systems and applications. Applications were, for the most, incapable of using multi-core capabilities and operating systems were unable to provide proper isolation between running applications. This is what the Hypervisor solved. And it was probably a good solution (maybe the best solution at the time), because clearly re-writing applications is much harder to do than instantiating many copies of the same app and load balancing them … However, had the OS provided the proper containment and isolation and the CPU provided performing support for that, a hypervisor would have been less required. Because in that case, even if you would not rewrite applications for better performance, you could still run multiple instances. In other words, if we would have had Zones on Linux 8 years ago, the IT world would have been perhaps somewhat different today. (although in fact, we had them … perhaps in the wrong hands though).
Anyways, it is clear that for instance today, running Apps on LXC is certainly more efficient than doing it in a hypervisor from a performance standpoint. It will be interesting to see how that evolves going forward. 

We may need network virtualisation, but we do not need a network hypervisor

Similarly, an IP Network does not natively accommodate for proper isolation and for multiple users with different connectivity and security requirements to share the network. IP networks are not natively multi-tenant, or having the ability to segregate traffic for various tenants or applications. They were conceived to be the opposite really.There are solutions like using MPLS VPN or plain VRFs: in a nutshell, you virtualise the network to provide such functions. You do that at the device level, and you can scale it at the network level (again, MPLS VPN being an example of that, although it only uses IP as control plane, it uses MPLS at the data plane). VPLS is another example, albeit for delivering ethernet-like services.

Arguably, MPLS VPNs and/or VPLS are not the right solution for providing network isolation and multi-tenancy in a high density data center environment. So there are alternatives to achieving this using various overlay technologies. Some are looking to do this with a so-called network hypervisor, essentially running every network function on x86 as an overlay.For those supporting this approach, anything that is "hardware" bound is wrong. Some people would say that VPLS, MPLS VPN, VRF, etc. are hardware solutions and what we need are software solutions. 

I believe this is not true.  A VRF on a network router or switch involves software, which will program the underneath hardware to implement different forwarding tables for a particular routing domain and set of interfaces. A virtual router running as a VM and connecting logical switches is pretty much the same thing, except that its forwarding table is going to be implemented by an x86 processor.I do not like this partial and simplistic vision of hardware vs. software solutions. There are only hardware+software solutions. The difference is whether you use hardware specialised for networking or hardware for general computing. The first is of course significantly more performing (by orders of magnitude), whilst the second provides greater flexibility. The other aspect is provisioning and configuration. Some would argue that if you run network virtualisation in software (again, meaning on x86 on top of a hypervisor) it is easier to configure/provision. But this is a matter of implementation only. 

Conceptually, there is no reason why provisioning network virtualisation on specialised hardware would be any harder than doing it on general compute hardware.

You will always need a physical network … make the best out of it 

Because you always need to have a physical network in a data center, it is also evident that if the network infrastructure provides the right isolation and multi-tenancy with a simplified provisioning and operation, it represents a more efficient way of achieving the goal of automating IT than duplicating an overlay on top of a physical infrastructure (much like LXC are more efficient than a hypervisor). This leads to the title of the post. 

The goal is not to do virtualisation. Virtualisation is not a goal. The goal is not to do things in software vs. hardware either. 

The goal is enable dynamic connectivity & policy for applications that run the business supported by an IT organisation. And to do so fast, and in an automated way, in order to reduce risk of human errors. Whether you do it on specialised sophisticated hardware, or on general compute x86 processors is a matter of implementation, with merits and de-merits on both approaches. Efficiency is usually achieved when software sits as close to specialised hardware as possible.


  1. Hi Juan,
    I really liked the post. You bring up some good points, a couple of which I'd disagree with however. For instance, when you consider that (in the software approach) a network function (e.g. Firewall) is simultaneously distributed across many edge x86 hypervisor CPUs in parallel, I have argued that approach actually has orders of magnitude better performance than the classic hardware based model where the function is anchored to a single hardware box. Some simple math to support that; if you have 100 hypervisors each with 10GE, you have a 1 Terabit Firewall. Show me a 1 Terabit Firewall in hardware, I don't know that one exists.

    Software based networking is much different now than it was 5 years ago, largely because of the convergence of virtualization and distributed systems. To describe software networking as just putting what was once in hardware into a VM is an obsolete point of reference.


  2. Thanks for your comment Brad. I actually think that the example you call out further probes my point. First let me clarify that I am not a security expert at all, but many of my security colleagues tell me that the word "firewall" is heavily abused in the industry. Most people today aren't talking about firewall anymore, and they seem to be talking more about NGFW instead. But in either case, FW or NGFW it refers to doing a whole lot more than packet filtering (be it statefull or stateless).

    As for your question, there are actually some vendors claiming 1Tbps firewalls in hardware (and for a couple of years already). And again, a "firewall" today integrates IPS-IGS, CGN, etc … (albeit when you add it all together that 1Tbps probably is more of a marketing figure …).

    But you bring the point of putting 100 servers to perform the function comparing to an appliance. But you can also put 10 appliances and you keep your 100 servers performing compute for applications. See? the scale-out works at all levels, not just at the hypervisor level. On a single node level, hardware is orders of magnitude above and because you can also scale out, the math remains.

    Putting the firewall at the host level isn't anything new either. It is a design choice. As you distribute more and more firewall features the solution is further complicated to code. Also, how much code and sophistication you want to put at the hypervisor level? The more you put, the less the hypervisor will be stable. There are also operational concerns … if your firewall code runs on 1,000 nodes, updating, patching, etc. has to be done … well … on a 1,000 nodes. You also have to consider that the cores and bandwidth you use for that firewall isn't free (even less if you have to pay virtualisation licenses to run it, although that depends on the virtualisation vendor).

    "To describe software networking as just putting what was once in hardware into a VM is an obsolete point of reference."

    How is this obsolete point of reference? conceptually, how is it different to do a packet lookup to perform a L3/4 decision different?

    In any case, the point I am making is that it doesn't matter. What matters is what applications need in terms of isolation, policy and connectivity. That is what matters. Regardless of where those applications live, be it a bare metal server, a container or a VM. Then, if your physical network provides for what you need, it makes little sense to spend CPU cycles doing any networking at the host level.

  3. I too agree with your point that we have the technology in the network stack and that with the right tools and protocols we have the same abilities, but I agree with Brad that it makes more sense to push all of this work down to a distributed cluster of resources that enables us to scale out larger, and at the same time have a very good handle on the costs as its just more compute resources.

    My take on the value of SDN is that it will enable us to deploy Network Services much faster and at the same time reduce the huge initial capital outlays we make by enabling us to truly pay as we grow just like we do with compute. We will finally also be able to accurately measure and bill for the Network resources you use as well without trying to slice up the Network devices into chunks that vary depending on what features you turn on and use.

  4. Thanks for reading and for taking the time to comment Carl. I think it all depends on specific functions and requirements. I certainly do not think that pushing out all network functions to a software stack is necessarily more cost effective. Again, imho, what can be done on a network that you have to pay for anyways should be done there. Just like what can be done on an OS+CPU should be done there, not relying on an extra software layer. On the firewall function specifically, I do not dispute the interest in putting that function into a software stack and using an scale out model. There's a lot of work being done to enable proper redirection for service insertion that works for bare metal, virtual and container approach. Where and how you do that is a matter of implementation, again with merits on every approach.