Wednesday, June 29, 2011

On L2 vs. L3 vs. OpenFlow for DC Design

The L2 vs. L3 debate is old and almost religious. One good thing about OpenFlow and more general Software Defined Networking is that perhaps it ends the debate: L2 or L3? Neither (or both?).

The problem in the end is that people perceive L2 as simpler than L3. I say perceive because it really depends on where you want to face complexity. From a simple-to-establish-connectivity perspective, L2 is easy. From a scaling perspective, L3 is simple (at least, simpler).

People working with servers and applications have had traditionally minimal networking knowledge, which has lead them to rely too much on L2 connectivity and today many application and clustering technologies wont work without L2 connectivity between involved parties. Same thing can be said about virtualization technologies. Easiest thing to do to make sure you can live move a VM between two hosts is assume they can both be on the same subnet and L2 broadcast domain. 

For VM mobility, as well as for many clustering technologies, it is important to avoid IP re-addressing, and in this sense it does not help that the IP address represents both identity, and location (by belonging to a subnet which is located in a particular place). This is why LISP is so cool, because it splits the two intrinsic functions of the IP address: identity and location. 

When looking at building a datacenter, and in particular a datacenter which will support some form of multi-tenancy and potentially can be used to host virtual datacenters (i.e. private cloud type of services), how do we want to do it? Do we use L2 or L3? Or is the solution to consider OpenFlow? Tough one.

For the past two or three years there's been some level of consensus that you must have large L2 domains, and that with newer protocols such as TRILL we will be able to build very large L2 networks, hence, that was the way forward. Reality is, most MSDPs today, to the best of my knowledge, are based on L3: because it works and scales.

Reality is as well that for delivering IaaS you will need to have some form of creating overlay L2 domains on a per virtualDC basis. Sort of like delivering one or more L2 VPNs per virtualDC. Why? Because the virtualDCs will have to host traditional workloads (virtualized) and host legacy applications and such which are not cloud-aware or cloud-ready. From a networking point of view this means each virtualDC will have to have its own VLANs, subnets, and policies.

VLANs are commonly used to provide application isolation or organizational isolation. So in a DC, this means you use dedicated VLANs for, say, all your exchange servers, all your SAP servers, etc. Or you may use different VLANs for different areas of the company, which then may share various applications, or you combine both and you give a range of vlans to each organization/tenant and then further divide by application. This needs to be replicated on each virtualDC.

At the infrastructure level, relying on current virtualization offerings, you may have dedicated VLANs for your virtual servers, where you have vlans for the management of virtual servers, for allowing VM mobility, or for running NFS or iSCSI traffic (also for running FCoE). 

Do you want to use the same VLAN space for infrastructure and for virtualDCs? Probably not. Then the question is whether is best to rely on a L2 infrastructure over which you deliver L2 VPNs for each virtualDC, or whether you build a L3 infrastructure over which you deliver L2 VPNs.

The latter one does not have, today, a standards based approach. The former has at least the known option of QinQ (with all its caveats). Some would argue that combining this with SPB or TRILL you have a solution. Maybe.

But I think the real way forward with scalability is build a L3 network, which can by the way accommodate to any topology and provides excellent multicast, and then build L2 overlays from the virtual switching layer. 

And then a question is whether this is all easier to do with OpenFlow. I think not, because in the end the control plane isn't really the problem. In other words: networks aren't more or less flexible because of the way the control plane is implemented (distributed vs centralized) but because of poor design and trying to stretch too much out of L2 (IMHO). 

I do not doubt you could fully manage a network from an OF controller (although I have many questions on the scalability, reliability and convergence times) but I don't really see the benefit of doing that. The only way I see a benefit is to avoid the L2 vs. L3 because at the controller you could bypass completely the "normal" forwarding logic and make a L2 or L3 decision on the fly regardless of topology. But the question how to scale that at the dataplane and also that in order to do that you must offload the first packet lookup to the controller and THAT wont fly.

So there it is … I think that with modern IGP implementations we can build networks with thousands of ports in a very reliable and easy to manage way, and by building a Mac-in-IP overlay from the virtualization layer, provide the L2 services required. That would be my bet on how to do things :-)

No comments:

Post a Comment