Out of Nillo's mind

Thursday, January 3, 2013

Why Canonical (and Microsoft) may have it right ...

It's been a while since the last post ... and I am shifting gears quite a bit away from networking too. But I was very interested by the announcement of Ubuntu Phone.

I have been an amateur user of Ubuntu Desktop for quite a while and I have always thought it was an amazing operating system. When I saw the activity on twitter I went to check on their website and I found something very sleek, neat and attractive: much like the desktop of course. There are a few "demos" available on YouTube and they display a very modern, elegant and more important, different user interface. An example is here:

I think it may have a niche competing with Android on low end budget SmartPhones and in any SmartPhone that is sold in budget sensitive markets. Manufacturers will look at alternatives to Android for multiple reasons, and one of them will be differentiation. Ubuntu Phone will provide that. It's not the only one offer of course, but it's a very nice one for sure.

There's been some critics about the fact that it is late to market (very possibly), since Android and iOS are dominant, and the battle for the third place is ripe for the taking by Microsoft. More interesting are the critics that say Ubuntu has got it wrong in that they are offering a mobile OS which is also a Desktop OS. The critic goes that Desktop is dead, and we should not try to keep hanging on to the old desktop habits and so on.

The timing is indeed an issue. Looks like the first devices officially available with Ubuntu would not show up before end of 2013 or early 2014 and Canonical has had a poor track when it comes to mobile OS in the past. On the other hand, the battle for the third place is so fragmented that they may still have a chance and quite frankly, it does not look as of today that Windows Phone is making serious inroads.

A Desktop? A Phone? or both ... bad idea?

Ubuntu Phone claims that on high end devices you will be able to use it as a desktop. So the concept is that you can run both Phone Apps and Desktop or Web Apps on it, and you can also dock it and connect it to a screen, keyboard and mouse and use it as a real desktop, with a real (good) desktop interface. So Ubuntu Phone is also Ubuntu Desktop. Microsoft is proposing a bit of the same thing with Windows 8, although so far they have been pushing the concept into the tablet (Surface), not really into the SmartPhones.

Is this a bad idea? Is this hanging off to a forgotten past? ... I think not. I believe the Desktop isn't dead. I have an iPad, and an iPhone, and I haven't stopped using my MacBook. Real work gets done on a desktop. When you need to spend time building or working on complex spreadsheets, or writing a content rich document, or a presentation, or programming ... you want a Desktop. ... or ... do you? Well. I think you need a Desktop and by this I mean a comfortable place, a nice (big) screen, mouse and/or trackpad and a proper (comfortable) keyboard. Where the applications run on is less important ...

And this is where I think that Ubuntu Phone got it right. There are two other niche markets where Ubuntu may grow. One is the geek power users who already use Ubuntu Desktop. The other is, potentially, anybody who sees the benefit of having their Desktop with them all the time. I do not see why not. I'd love to have my MacBook inside my iPhone, so that when I am in a place to work, I can dock it and work on it ... Sure, today's high end Smart Phones aren't as powerful as a desktop/laptop computer, but think in two, three years they may be powerful enough.

In fact, today they may be already ... for you can have a thin client on them and then off load the heavy work to a virtualized OS which runs where there's power for it. For enterprises, this is good.

Time will tell ... but for now, I like what I see. I can't wait to get my hands on it even if it's running on a VM only. Probably the main reason why I like is because it is different. Unlike Google's Android, they did not just went on to copy Apple's user interface as a means to penetrate the market and then slowly tweak it to claim it's different and show up some "innovation". No. They created something new, something different. Well done.

Friday, March 9, 2012

A sad failure?

Very recently I have read an article on Network World about some investment firm warning on issues with Juniper's problems and/or delays on shipping certain recent products and technologies. Amongst them QFabric was cited.

It is all "alleged" information, as much as I know, but it comes to no surprise. I don't have a close knowledge of the service provider core routing market to speak about the T4000 or the PTX system, but I have followed very very closely QFabric since it was announced (and before it was announced as well).

It is quite shocking that one year, yes, one full year, after a big announcement like that one, from a company like Juniper so quick to come up with customers endorsing them, no single customer reference is out there. Of course this may change any day now, possibly, but it will not hide the failure already. It clearly means when it was announced it was not ready (it shipped six months after), and it seems to indicate that when it shipped it had a number of issues ...

Despite all the mind share captured, all the good press, all the bloggers fascinated with Juniper's QFabric marketing claims (naively I must add, in most cases), the fact is that JNPR's switching market share remains marginal. Marginal, after being in the marketplace for over 4 years now considering the EX product line, which still is the only one selling in some volume afaik. Contrast this with Cisco's UCS for instance, and the impact it has had in little more than 2 years as a comparison.

I may be wrong, and I really hope I am wrong, but I think JNPR will need to recognize the failure and change the strategy. To me, based on what we know now, it is a failure, and a sad one. Sad, because I expected better from Juniper, and sad because healthy competition is great for every company in the market, and most important, for customers.

And I think it is a failure for various reasons:

1. Architecturally. It is very, very complicated to scale a distributed ethernet switch to the levels that Qfabric is intended to do. And if it can be done (and having experience with distributed ethernet switches to very low level I have serious doubts about it) it is very difficult that such thing is done while maintaining the system economic and simple to operate. It already isn't either of those two.

2. QFabric missed the industry trends. There are two key trends:

- Large L2/L3 fabrics (or simply put, networks :-) ), with some form of edge-based overlays to deliver network virtualization. While QFabric could provide the basis for this, it fails at scale. It is too big for smaller deployments, too small for large ones. But more importantly, it has no value proposition. If I want to build a L3 fabric to then run a Distributed Edge Overlay (DEO) I can use standard OSPF or ISIS engineered networks which will support a much larger number of switches in the fabric than QFabric allows and can be potentially built using a number of vendors, and even combination of vendor equipment etc. Why would you do it with Qfabric?!

- Software Defined Networking: you can jump on the band wagon by adding marketing, claiming open APIs and whatever. Organizations looking at this have anyone one of two angles: integration into cloud stacks (i.e. OpenStack, where JNPR's Qfabric role is marginal/none), or open capabilities to manage forwarding elements from a controller (i.e. ala BigSwitch) using Open interfaces like OpenFlow. JNPR's approach today fits none. QFabric is proprietary in its architecture and does not lend itself well to allowing companies build tradicional fabrics not migrate to OF controllers as it is today.

In the end ... for those who are happy with building networks using tradicional ethernet technologies and evolving with those, QFabric does not offer any compelling value proposition. And for those facing real problems with today's state of ethernet technologies ... well ... the solution cannot be to simply build a larger ethernet switch.

But that said, I know there's great talent working on QFabric, and I really hope they will get it right, as right as it can be.

As always, opinions posted here are my own ...

Friday, February 10, 2012

Some musings for old networkers ...

Back in 1999 I was working in South America. Most service providers in the region were operating TDM and FR networks and the hot technology they were evolving to was ATM. It promised many good things, including a true converged network capable or transporting data, voice and video. Of course many at the time were not thinking of packet data (nor packet video), but actual PCM voice channels over CES.

Why am I bringing this back today? Well, because back in those days I remember having so many conversations with SPs about delivering virtual private networks using IP technology, or in fact, a nascent technology that leveraged an IP control plane: tag switching, which evolved into standard MPLS. For Cisco, this was a key differentiation in the Stratacom product line of ATM switches.

But most people in those carriers were very circuit orientated. The easiest way to setup a private network for them was to mimic the TDM world of circuits and channels ... with a PVC - or a mesh of them.

It is funny to see that, in essence, today's latest and greatest solutions for implementing virtual private networks (network virtualization) still relies on circuits in some sort of way ... IP overlays now.

The kings of the ATM SP market at the time in the region were Nortel and NewBridge. Cisco had important franchises as well. SPs were looking for ways to scale better the circuit based ATM backbones to deliver services to end users (some were thinking about going into households even, which eventually happen through the earlier phases of ADSL deployments in early 2000s).

How to build and manage all those thousands of PVCs? ATM proposed a neat way ... SVCs! VCs which were created by a software layer and at the request of the application! ... Good things were about to happen. And then it came the fierce competition ...

Nortel and many others were pushing for PNNI, a routing protocol plane, network based, which would run distributed in all ATM switches in a hierarchical topology. PNNI was essentially a routing protocol for establishing ATM VCs. Good about it? It was standard ... promised vendor interoperability and in fact there were bake-offs to prove it and so on ... Cisco was betting on PNNI and actually had a pretty robust stack implemented on its LS1010s.

NewBridge on the other hand had a radically different approach. They had a very strong management solution (the 46020) coming along from the TDM and FR ages. Essentially for them, a software running on a server would control the setup and management of PVCs and SVCs alike! ... better, it could interface through APIs with an application layer for, say, deliver IP and VoD services to ADSL consumers for instance ...

The "SDN" approach to managing ATM never worked, it failed to scale and proved to be a lock-in for customers who could never think to imagine running an ATM switch that was not NewBridge.

Of course, the end of the story I just related is well know ... ATM and PNNI slowly died, being overruled by IP and MPLS. NewBridge was acquired by Acaltel (which actually reused its 46020 for managing the 7050/7040s boxes and did well with it), and Nortel ... well ... sorry for them.

I can't help to think of this and see some analogies with recent trends and announcements in the networking industry. By no means I pretend the end could be similar though ... much have changed in the compute industry in particular to help scale a controller managing overlay networks. But it is fun to think of the irony of things, and how circuits come back to haunt us IP heads ...

Wednesday, February 8, 2012

Nicira: fear them not

So Nicira is finally out of stealth mode. This is good news. Much of what we have seen from their website now is confirming rumors and expectations. In the press however, there continues to be a bit too much hype in my opinion. Talks about the next VMWare are a bit out of plance I think, if for no other reason, because the server and networking industries are very very different (in dollar value to begin with ...).

The general assumption is that networks are very static and difficult to manage and adapt to business needs. Michael Bushong from Juniper writes that they "are far too big and complicated to run by hand and are therefore operated by a maze of management, provisioning and OSS/BSS systems".

I guess when you are coming from a small installed base as a networking vendor in this space you want to exaggerate the issues faced by DC and Enterprise networks today. It is true that networks are not managed by hand, and are managed by OSS/BSS systems, but then isn't this a good thing anyways? And more importantly, isn't this true for server, server images, and storage as well? I wouldn't say managing hundreds or thousands of VMs with different images, patching levels, etc is a simple task that anybody want to run by hand.

But it is true that networks are static and that automating network configuration isn't an easy task. Adding ports to VLANs can be automated in a somewhat easy way. But things like stretching VLANs, or moving entire subnets around are a more difficult task. Now, a network engineer would claim that the problem isn't the network itself, but the way applications are built.

After all, if you build a network based on L3 with proper subnet planning, you will never have an issue allocating network resources for any VM you provision on the network, and all a VM needs for communication is an IP address. But the issue is that applications aren't built to run in "just any subnet", for once, they need to communicate within the subnet for many tasks with other components of the application. And then there's policy and security which if tied to the IP Address, becomes a nightmare to manage and enforce. And decoupling the policy and security rules from the IP address isn't easy to do today either.

There are many things I agree with Mr. Bushong on though, and one is that "programmability is about adding value to the network control, rather than a threat of commoditization".

Several years ago I read a paper on Microsoft's VL2 proposal. It is very similar to what Nicira is doing in concept: at the server network stack you build a tunneling mechanism that facilitates endpoint communications. At that moment I thought such approach wouldn't be feasible for it demands to change the server TCP/IP stack, a daunting task. But virtualization has changed that, because now we CAN change the stack at the vSwitch level, while the server OS, close to the application, remains unchanged. Nicira has also added one more thing: a northbound API to provisioning systems that can harmonize the network connectivity for endpoints with other resources (server, storage, etc.).

In itself, Nicira's solution isn't providing anything new: builds overlays to facilitate endpoint communication. This can be done with VXLAN as well, what Nicira is providing is, supposedly, a control plane capable of creating and managing those overlays in an automated and scalable way. The latter point is to be confirmed of course.

Many seem to think that this will be the end of networking as we see it, and that the physical network becomes a commodity. I think this isn't true. First because building a large, scalable, fast and performing L3 network ins't rocket science but it isn't something many have succeeded at. There is a reason why Internet operators rely on just two companies for that: Cisco and Juniper.

Second because as you want to improve efficiency of utilization of your physical topology, and provide differentiation to applications that require it, your PHYSICAL network must have a way to view and interact with your overlays.

And there is more. Once you have built an architecture that enables you to create such overlays to allow endpoint connectivity, what happens when connectivity needs to be done with elements outside of your overlay? You need a gateway out, and a gateway in. I can see ways in which you leveraging OF and a controller you can scale the gateway out from the vSwitch itself, but scaling the gateway in is more difficult and chances are it will be done via appliances of some sort, which then need to be redundant etc.

So I think the more Nicira's we have the better. The more development we see that facilitates moving towards cloud architectures, the more demand for performing and intelligent networks. I do not see commoditization happening in the network for the same reason hypervisors haven't commoditize the CPUs. Intel and AMD are now building CPUs that offer more services to the hypervisor layer, in the same way, we will see networks which offer better services to the overlay networks.

Net net, I am one who thinks that much of the complexity in networking isn't created because of wrongdoings in the past, or just legacy technologies. It is because dealing with a network isn't like dealing with endpoints. It is a complex and evolving challenge in itself.

Bright times ahead for the industry …

Monday, January 23, 2012

On Soft Switching and Virtualized Networking in General (Part I)

Wow, it's been almost three months since I last wrote here. Too bad. I actually have at least four posts on the works, but I never seem to have time to finish any of them. Apart from lack of time, sometimes I am too ambitious and perhaps each post could be divided in several smaller posts.

That's what I decided to do with one of the subjects already and this will be the part I. I want to write about the challenge of networking virtual machines, and go through the various options considering pros and cons and where different solutions fit.

As always, the writing represents my opinions only, and are based on my limited knowledge of the subject. On this first part I review the challenges of networking for virtual machines and talk about what others write about it. Then on part II I will talk more about Soft Switching economics and where I think it fits and on part III I will do a similar thing about the hardware approach (VM-FEX). Finally, I am planning a part IV with details of current VM-FEX implementation on Cisco Nexus switches.

The Challenge of Networking in Virtualized Environments

Much has been written about this topic and as a networking-head fully involved with virtualization I find it so interesting that I spend quite some time reading and thinking about it. Finding the free time to write about it is another story.

There is no doubt that it is a challenge. Networking a large number of servers isn't that easy a task already, but with Virtualization you have to add the challenge of a much denser number of endpoints (the VMs) on the network, and also the mobility aspect of virtual machines to complicate matters. But scale and mobility are just two of the dimensions to this problem, with security, performance and manageability probably being the other top ones.

In general, I believe there are two approaches to solving the challenge of networking in virtualized environments: software based solutions, integrated into the hypervisor, and hardware based solutions seeking to off-load switching from the hypervisor. I think both have a place.

Martin Casado, the network heretic, has written much about why he thinks soft-switching "kicks mucho ass" and will be the winning solution. He dislikes solutions which leverage NIC virtualization (SR-IOV, 802.1BR) or those which force traffic to be pushed off of the server to have it hair-pinned if needed (802.1BR, 802.1bg). Many other bloggers have bashed tagging based solutions (i.e. 802.1BR) saying they aren't needed and they are just another attempt from hardware vendors to sell more and new hardware. It is undeniable that vendors develop technology hoping to get a profit from it, and it's quite a legitimate thing to do as well in my opinion, but I am dead sure that no vendor develops a technology to force a customer upgrade. Plain simple: if the technology isn't solving a problem, it wont sell. People isn't stupid.

The problem we are solving (networking in highly virtualized environments) accepts multiple solutions, and it is quite different for different people. I think we can distinct two big type of organizations: IaaS service providers and Virtualized deployments in Enterprise space (i.e. Private Clouds). In the first type of customers, soft-switching may have an edge, while on the second hardware switching can be more interesting.

Casado dislikes hardware switching and I think has a good case on this for cloud service providers as I just mentioned. The best way to understand why he dislikes those hardware based options and thinks soft-switching will prevail are well explained in his blogs (which I recommend reading entirely, there's four parts), but in a nutshell he believes that software will always be feature-richer and with the performance and low price per core today the economics of soft-switching are ideal.

I disagree with him to some extent. Certainly to the extent of scoping the analysis to the entire industry. I think his analysis considers only a part of the industry, and that both options (soft and hard switching) will have its place and space to bring solutions to virtualized server networking. Let's see if I can explain why I think this way in the subsequent posts. Then I will explain the benefit of hardware based solutions and how they work.

I must stress again that my writing on this blog is done in my free time and reflects only my own personal interests and opinions.

Friday, October 21, 2011

On Catalyst 6500 investment protection

I have read an article in Network World about the real investment protection of the Catalyst 6500. I think the article is lousy and biased, focusing on an analyses of speeds and feeds and thinking that the only reason for network upgrades is raw throughput. Moreover, there were some comments to that article which pointed in the same direction. Everything is subject of opinion, here's my own:

Anyone that has ran a network knows it's not just about how many ports a switch has and how many packets per second it moves. This post is so simplistic at that ... and some comments below as well ...

Yes an Arista 7050 can do 64 linerate 10GE ports in one RU. It better ... it's been designed in 2011, I'd expect it to do that. But, can it do that for running inter-vlan routing for 1000+ different vlans concurrently, hundreds of HSRP groups? having ACLs applied to the SVIs with tens of thousands of lines each, running BFD on uplinks providing sub-second convergence, routing for 10K mcast groups or more, etc.? ... Can it be used to extend LANs over an MPLS backbone? ... Can it ...? The list would be TOO long. Way too long. And it would apply to switches from other vendors too.

Mid-size to large companies and many small ones NEED those capabilities. Running a network means running hundreds of vlans with policy applied to them, etc. If a company doesn't need any such thing, or just need a dozen vlans with static routing, sure they can use a lower end switch. And they do. And that's fine. And that's why Cisco also has other products in the portfolio.

Reality is, if a customer did an investment five or six years ago in a Catalyst 6500, he may have also considered at the time a Nortel 8600 (now virtually dead), or a F10 E600 (no comments), or a Foundry Big Iron 8000 or and RX-16. Had they gone with one of those, look at how much hardware (or even software) upgrade options are left for them ...

But no. Most of them went with a Catalyst 6500, and as much as it hurts other vendors (and some people on the grey market for some reason that escapes my understanding), those customers know that did the right choice. And today, those customers have options for upgrading and increasing the value of their investment and solving new network problems. Those upgrade options may be great for some, and maybe insufficient for others (which is why Cisco also has other products in the portfolio).

And Art, another angle you are totally missing is operational aspects. Running a Cat6500 with a Sup2-T presents no operational change at all for customers, no re-learning, no re-scripting, etc. THAT is also investment protection. THAT would be impossible had they invested on another vendor when they chose to trust Cisco.

Finally, I'd like to add that Sup2-T enhances the performance of a Catalyst 6500 is ways beyond pps. Any Catalyst 6500 running with combinations of 67xx cards, 61xx and others just by replacing the Sup with new Sup2-Ts get to use up to 4x more vlans, 4x more entries on the netflow tcam, flexible network, vpls available to every port, SGT/CTS and many more features.

So yes, there is investment protection. And even more when you compare with other products that have been in the market for a while, no matter which vendor you pick.

Saturday, October 15, 2011

On SDN, and OpenFlow - Are we solving any real problem?

I must admit that I started looking at SDN and OpenFlow with a lot of skepticism. It is not the first time I was facing a networking technology which proposed a centralized intelligence to setup the network paths for traffic to go through. Such thinking always reminds me of old circuit-based networks and NewBridge 46020, which proposed TDM path setup to FR and ATM, and when making in-roads into ADSL, offered it as the solution to all evils. ATM LANE was, in a different way, another "similar" attempt.

Being a long term CCIE makes me an IP-head, no doubt about it, and a controller-based network is something that I really need to open my mind to, in order to even consider it. I am trying though ...

The way I look at it, the question about SDN and OpenFlow alike is: what problem are we solving?

So far, I see three potential problems we are trying to solve:

- scale: help building more scalable networks spending less money
- complexity: simplify running a complex network
- agility: simplify or fasten implementing network services & network virtualization

In this blog post I try to look at SDN/OF from those three potential problems. I make little to no distinction between SDN and OF and that is wrong because they aren't quite the same thing. Things like 802.1x, or even LISP to some extent, could be somehow considered Software Defined Networking, in that the forwarding and/or the policy is defined in a "central" software engine or database lookup.

But for the purpose of this post, I really look at OF as THE way for implementing SDNs. But before I begin ...

A Common Comparison I consider flawed ...

Many people put existing Wireless LANs as a way to say the controller based approach is proven. Sure, look at wireless networks today, almost all are using a controller based approach ... well ... yes, but no. The biggest difference with OF from a networking perspective is that the controller FORWARDs all the traffic, which is tunneled in an overlay from each of the access points. So the controller IS part of the datapath, and an intrinsic part of it. In fact, can be the bottleneck.

Moreover, the capillarity of a WLAN network is orders of magnitude lower than that of a datacenter fabric, so any analogy is, IMHO, flawed.

Scalability

From a scalability perspective, my first take at SDN and OpenFlow was focused on two points which I looked at as big limitations:

1. a totally decoupled control plane (perhaps centralized - albeit distributed in a cluster) requires an out of band management network which could limit scale and reliability (plus add to the cost)
2. programming "flows" on device TCAMs using OF will not scale or at least will not provide any savings in CAPEX lead by networking hardware itself

I see point one above less as a limitation now, so long as we really succeed at achieving a large simplification of the network in all other areas (beyond management), thanks to the SDN approach.
It isn't unusual to have an OOB management network in tier-1 infrastructures anyways. In the SDN/OF approach however, the OOB network is really a critical asset (even more than critical ...). It must have redundancy with fast convergence built into it and be built to scale as well. We also need to factor the cost of running and operating this network too. Also, as the policy and/or number of flows becomes higher, the cost of the cluster itself may be non-negligible.

Point two from above is still one where I need to better understand how things will be done. A a first glance, I thought this was going to be a big limitation because I thought each network forwarding element would be managed like a remote linecard, programmed perhaps using OpenFlow. In such case, the hardware table sizes of each network element would limit the entire infrastructure because for L3 forwarding you want to keep consistent FIBs and for L2 forwarding even more to minimize flooding. Hence, if your forwarding elements are limited to, say, 16K IPv4 routes, that's the size of your OpenFlow network ... there are ways to optimize that by programing only part of the FIB, which is possible as the controller knows the topology. But then if there are flow exceptions ...

But then of course, things change if you consider that ... why would you need to do "normal" L2 or L3 lookups for switching packets? You can forget about L2 and L3 altogether (potentially). And then, I assume the "controller" could keep track of the capabilities of each node including table sizes, and program state only as needed and where needed. This adds complexity to the controller, but should help scaling.

But can this scale if hardware is programmed per flow?

I still don't see this happening really. Two issues with this: scale the flow setup (a software process), and scale the hardware flow tables. I understand the flow setup is not necessarily THE problem, but still, let's review it. Let's say the first packet is sent to the controller for the flow to be setup, all of this over the OOB network. This will add delay to initial communication and put load on the controller, but no reason why this can't scale out with multiple controller servers in a cluster or splitting the forwarding elements between different instances of the controller. All of this adds to the cost of the solution though (and add management complexity too).

But what I don't see is this scaling at the forwarding chip level. I wonder how to program hardware with the SDN approach. The OpenFlow way seems to be to do it on a per flow basis, leveraging table pipelining.

Of course it all depends on what do we call a flow. If we take source/destination IP addresses plus tcp/ip port, any aggregation switch will easily see hundreds of thousands of flows at any given time. Even at the ToR level the number of flows will ramp up rapidly. This will kill the best silicon available from vendors such as Broadcom, Fulcrum or Marvel so easily. We can indeed limit to source-destination mac addresses, or IP for that matter, but then that limits a lot what you CAN do with the packet flows. So if host A wants to communicate with host B that is two flows a->b and b->a. That is if you define a flow by source/destination mac address. In this case, let's assume you have 48 servers connected to a ToR switch. Let's say there are 10 VMs per server (40 is very common in today's Enterprises by the way). Let's say each VM needs to talk to two load balancers and/or default gateway plus to 10 other VMs. This means each VM would generate 12-14 flows. So this means each ToR switch would see 48 x 10 x 12 = 5,760 (times two, because they have to be bi-directional). Now that isn't too much, chips like Trident can fit 128K L2 entries, which in this case would mean flows if we define them as per mac-address. But think of the aggregation point which has 100+ ToR switches connected. Now those switches need to handle 576,000 flows (times two). Way more if you assume more than one vNic per VM.

At any rate, if you want to handle overlapping addresses, you also need to add some other field to the flow mask ... So I still don't see how this can scale at all, certainly not using "cheap" hardware.

In the end, if you want to run fast, you'll need to pay for a Ferrari, whether you drive it yourself or have a machine do it for you.

But I see a benefit if we can run the network forwarding elements in a totally different way than what we do today. The options for virtualizing the network can be much richer, this is true, but would come at a cost (discussed below). And also, I see a point to scale the network beyond what current network protocols allow which can be interesting. I certainly understand the interest from companies running very large datacenters, which tend to have very standardized network topologies which can benefit a lot from the SDN approach. But at the same time, I do not see why standard routing protocols can't be made to scale to larger networks too ...

I doubt OpenFlow will be the right approach in the enterprise for quite a while, because in that world, you rarely build from scratch 100%. There is always a need to accomodate for legacy and this will mandate for "traditional" networking for quite a while, no doubt (if SDN ever really takes up, that is). Sure, I know companies like Big Switch are looking at ways to use OF as an overlay into existing networks. We will have to wait and see how this works ...

Simplify Running a Complex Network

This point is a tough one. What is simple to some is complex to others. Someone with networking background will not consider that running an ISIS network which implements PIM-SM is difficult, while someone with software development background will see it very complicated. Likewise, the same software developer may think that running a cluster of servers which control other "servers" (that each do packet forwarding) is very simple.

SDN looks, on powerpoint, very promising for network management simplification. But when you begin to dig in for details, on how to do troubleshooting, how to look deep into the hardware forwarding tables, how to ensure path continuity or simply test it etc, you begin to see that what was simple in concept, becomes more complex.

Simplify Network Services

A lot of the writing I have seen around this focuses on the idea that once we have a "standard" way of programming the forwarding hardware (OpenFlow, that is), then all forwarding actions become sort of like instruction sets on a general purpose CPU. Hence, all network problems become solvable by writing programs that operate on the network using such instructions.

I have seen typically two examples given for quick-wins of this approach: load balancing and network virtualization. Both hot topics on any data center. Others point to fancier ones, like unequal load balancing, load based routing, or even shutting down unused nodes. The latter speaks for itself and is foolish thinking ...

All others CAN be done with traditional networking technologies, and if they are not implemented is typically for very good reasons.

On the point of load balancing and network virtualization, what I have seen so far are discussions at very high level, which show how this can indeed be done. OpenFlow-heads praise how this is going to be not only simple, but even free! ...

Implement a load balancer, nothing simpler and cheaper. The fantastic OF Controller will simply load balance flows depending on the src address for instance. Done. Zero dollars to it. Of course, it obviates the point of hardware (and software) flow table scalability - already mentioned above. Of course it obviates the fact that a load balancer does A LOT MORE than push packets out of various ports depending on source address ... it keeps track of real ip addresses, polls the application to measure load, off loads server networking tasks, etc. There's a reason why people invest in appliances from Cisco or F5 to do load balancing. Switches (from multiple vendors) have been able to do server load balancing for a long time, but what you can do there isn't just enough for most applications. OF changes nothing there.

Network Virtualization is another one where the complex becomes so simple thanks to SDN and OF. I admit to write here from ignorance of the actual work of companies like Big Switch or Nicira, of course. But most of what I read resolves into implementing an overlay network with a control plane running on software. Nothing different to many other approaches today and what would be done using VXLAN or NVGRE. At any rate, I would argue LISP is a better choice, but alas, it does not solve L2 adjacencies which are required for clustering and other reasons (which OF doesn't solve either).

I have seen many others proposing that thanks to OF, one can easily program the controller to use fields like MPLS tags, or VLAN tags, etc. to do segmentation ala carte. This is true, and fine. But I wonder, how is this good?!

And how is it different from doing traditional networking? Sure, Cisco, Brocade, Juniper, F10 and others could have decided to change the semantics of existing network fields to implement segmentation and many other features. And sometimes we have seen this done. But in so doing, they become proprietary. They don't interoperate.

IF a controller software vendor X provides a virtualization solution that works that way (by redefining the semantics of existing fields), it offers a solution that locks-in the customer with that software vendor. A solution for which a tailor-made gateway will be required to connect with a standards based network, or to a network from any other company.

Imagine company Z who runs a DC with a controller from software vendor X. Imagine company Z merges with company Y, who runs with a controller from a different vendor, ... imagine the trouble. Today, company Z running Juniper merges with company Y running Cisco and they connect their networks using OSPF or BGP, or just plain STP if they are not very skilled ...

I am sure I am missing the obvious when so many bright people praise OpenFlow. I just can't see how it solves problems that we can't solve today, or how it does it better. Or how it really will make it for a better industry. Many would like to see a world where networking wont be dominated by two vendors. OF, at best, could change who those two vendors are, nothing more.

Conclusion ... (for now)

I think OpenFlow is a very interesting technology, and the SDN paradigm one that can contribute to many good things in the industry. But up until today, almost everything that I read about both topics is of very high level, and idealistic to the point of being naive. In my opinion, a lot of the assumptions made on the problems that OpenFlow will solve are made without knowledge of other existing solutions. I am yet to see an expert in IP and MPLS praise OpenFlow for solving problems that we couldn't solve with those two, or solve them in a clearly advantageous way.

So far to me, from my ignorance of things I admit, OpenFlow looks at just a different way to do the same. And for that ... what for?

Translate