Translate

Wednesday, June 29, 2011

On L2 vs. L3 vs. OpenFlow for DC Design



The L2 vs. L3 debate is old and almost religious. One good thing about OpenFlow and more general Software Defined Networking is that perhaps it ends the debate: L2 or L3? Neither (or both?).

The problem in the end is that people perceive L2 as simpler than L3. I say perceive because it really depends on where you want to face complexity. From a simple-to-establish-connectivity perspective, L2 is easy. From a scaling perspective, L3 is simple (at least, simpler).

People working with servers and applications have had traditionally minimal networking knowledge, which has lead them to rely too much on L2 connectivity and today many application and clustering technologies wont work without L2 connectivity between involved parties. Same thing can be said about virtualization technologies. Easiest thing to do to make sure you can live move a VM between two hosts is assume they can both be on the same subnet and L2 broadcast domain. 

For VM mobility, as well as for many clustering technologies, it is important to avoid IP re-addressing, and in this sense it does not help that the IP address represents both identity, and location (by belonging to a subnet which is located in a particular place). This is why LISP is so cool, because it splits the two intrinsic functions of the IP address: identity and location. 

When looking at building a datacenter, and in particular a datacenter which will support some form of multi-tenancy and potentially can be used to host virtual datacenters (i.e. private cloud type of services), how do we want to do it? Do we use L2 or L3? Or is the solution to consider OpenFlow? Tough one.

For the past two or three years there's been some level of consensus that you must have large L2 domains, and that with newer protocols such as TRILL we will be able to build very large L2 networks, hence, that was the way forward. Reality is, most MSDPs today, to the best of my knowledge, are based on L3: because it works and scales.

Reality is as well that for delivering IaaS you will need to have some form of creating overlay L2 domains on a per virtualDC basis. Sort of like delivering one or more L2 VPNs per virtualDC. Why? Because the virtualDCs will have to host traditional workloads (virtualized) and host legacy applications and such which are not cloud-aware or cloud-ready. From a networking point of view this means each virtualDC will have to have its own VLANs, subnets, and policies.

VLANs are commonly used to provide application isolation or organizational isolation. So in a DC, this means you use dedicated VLANs for, say, all your exchange servers, all your SAP servers, etc. Or you may use different VLANs for different areas of the company, which then may share various applications, or you combine both and you give a range of vlans to each organization/tenant and then further divide by application. This needs to be replicated on each virtualDC.

At the infrastructure level, relying on current virtualization offerings, you may have dedicated VLANs for your virtual servers, where you have vlans for the management of virtual servers, for allowing VM mobility, or for running NFS or iSCSI traffic (also for running FCoE). 

Do you want to use the same VLAN space for infrastructure and for virtualDCs? Probably not. Then the question is whether is best to rely on a L2 infrastructure over which you deliver L2 VPNs for each virtualDC, or whether you build a L3 infrastructure over which you deliver L2 VPNs.

The latter one does not have, today, a standards based approach. The former has at least the known option of QinQ (with all its caveats). Some would argue that combining this with SPB or TRILL you have a solution. Maybe.

But I think the real way forward with scalability is build a L3 network, which can by the way accommodate to any topology and provides excellent multicast, and then build L2 overlays from the virtual switching layer. 

And then a question is whether this is all easier to do with OpenFlow. I think not, because in the end the control plane isn't really the problem. In other words: networks aren't more or less flexible because of the way the control plane is implemented (distributed vs centralized) but because of poor design and trying to stretch too much out of L2 (IMHO). 

I do not doubt you could fully manage a network from an OF controller (although I have many questions on the scalability, reliability and convergence times) but I don't really see the benefit of doing that. The only way I see a benefit is to avoid the L2 vs. L3 because at the controller you could bypass completely the "normal" forwarding logic and make a L2 or L3 decision on the fly regardless of topology. But the question how to scale that at the dataplane and also that in order to do that you must offload the first packet lookup to the controller and THAT wont fly.

So there it is … I think that with modern IGP implementations we can build networks with thousands of ports in a very reliable and easy to manage way, and by building a Mac-in-IP overlay from the virtualization layer, provide the L2 services required. That would be my bet on how to do things :-)

Friday, June 17, 2011

Why FC has a brighter future ahead thanks to FCoE

I must admit that I have been an sceptic about FCoE since Cisco introduced it in 2008 up until recent times. With limited storage knowledge and background, I though that going forward NAS would prevail for most environments, and iSCSI would be the winning option for block based storage needs. Now I think there's going to be space for all of these, but FC has a better chance when dealing with block storage.

Why am I changing my mind? Well, a bit more knowledge, but also recognizing the facts. Overall FC market is still growing. Some analysts estimate 9-10% Y/Y growth at the end of 2010 for total FC market in vendor reported revenue. And it is remarkable that FCoE is becoming a larger part of that market. Depending on reports, it would now be up to 10% of the total FC market (considering adapters, switches and controller ports). And the really important thing is that this is more than doubling its contribution to the total market vs one year ago.

Those are some facts. About knowledge, I know recognize that FC networks bring not only stability and performance required for Tier 1 applications and mission critical environments, but also offer management over the SAN that is unmatched by other SAN protocols. The problem for many customers was that deploying FC was prohibitively expensive: the need to deploy a completely separate network, (which usually had to be design-stamped by the storage vendor) and the lack of FC interfaces on low end (and even mid-range) arrays made it impossible for small and mid-sized organizations to even consider it.

This is where I see a new angle now. FCoE is going to make FC as ubiquitous as Ethernet, and almost as affordable.

Customers jumping on iSCSI would normally pick on using the same switches for LAN and iSCSI (perhaps dedicating ports for the latter, and certainly should be dedicating separate vlans, subnets, etc.). Server side? Just another pair of GE ports. Usually not adding much to the cost if you were considering quad NIC cards on top of whatever comes as LOM.

FC would have added a lot. A pair of 4Gbps ports would represent an HBA or about $2K, and then you had to add separate dedicated Fabric switches at the access layer.

Well with FCoE, you put FC at the same level as deploying iSCSI for many people. First of all, FCoE now comes standard in NICs (CNAs) from many vendors, including Emulex, Qlogic, Intel, Cisco or Broadcom. Intel's approach of OpenFCoE adds no cost to deploying just 10GE, and renders the price per port to about $400 range. Such CNAs are already certified by most relevant storage vendors including EMC and NetApp. Vendors which, btw, are also adding FCoE support to their mid-range and high end arrays.

Now the network piece. 10GE ToR ports are about $500-600 a port. Cisco solutions there, thanks to FEX approach, allow to deploy a very cost effective 10GE access network, where all ports can support FCoE. So, in essence, you are at the same position as with iSCSI now: can use affordable ports at the server, have support for most common OS, can use the same network infrastructure, and can pick from multiple storage vendors. Nice thing on top? You can get management tools like CIsco's DCNM to give you visibility of LAN and SAN traffic (which you do not get with iSCSI).

Bottom line, I think FCoE is making FC easier and cheaper to deploy, so I expect more and more customers to consider using this technology, on top of those already using FC to adopt FCoE as well (using less power, lower capes, simplified operations, etc.).

Of course there is space for NAS deployments as well, and iSCSI will continue to grow, but I think FCoE makes FC much more competitive as a technology than it ever was for deploying a SAN.

Ok, me too, I am now blogging ...

Again. Yes, I had some blogs in the past, mainly to keep family and friends updated with certain important events in my life. But then ... well. Then for numerous reasons I had no time and was not in the mood to write at all.

Now it's changing. I have more and more contact with remote people. This is not just friends and family (having my parents and brothers live in three different countries helps), but also colleagues and other people. Facebook is cool for keeping in touch with many of them. Twitter too. But none of them allow to really express what you think or feel about something you want to share. So yes. Me too, I am blogging (again). If nobody ever reads this? Who cares :-)

Language? ... hmmm now that is an issue. I have friends and family who can only read Spanish. Also many who read English but can't read Spanish, and some who would perhaps welcome French. I choose English for writing most of the time. It is what I feel more comfortable with now a days. That said, I suppose that if I decide to write about whatever is happening in Spain, or about the endless disputes between Real Madrid and Barcelona, I will probably do it in Spanish. We will see. In any case, I don't care if anybody really reads it right?