Translate

Friday, March 9, 2012

A sad failure?

Very recently I have read an article on Network World about some investment firm warning on issues with Juniper's problems and/or delays on shipping certain recent products and technologies. Amongst them QFabric was cited.

It is all "alleged" information, as much as I know, but it comes to no surprise. I don't have a close knowledge of the service provider core routing market to speak about the T4000 or the PTX system, but I have followed very very closely QFabric since it was announced (and before it was announced as well).

It is quite shocking that one year, yes, one full year, after a big announcement like that one, from a company like Juniper so quick to come up with customers endorsing them, no single customer reference is out there. Of course this may change any day now, possibly, but it will not hide the failure already. It clearly means when it was announced it was not ready (it shipped six months after), and it seems to indicate that when it shipped it had a number of issues ...

Despite all the mind share captured, all the good press, all the bloggers fascinated with Juniper's QFabric marketing claims (naively  I must add, in most cases), the fact is that JNPR's switching market share remains marginal. Marginal, after being in the marketplace for over 4 years now considering the EX product line, which still is the only one selling in some volume afaik. Contrast this with Cisco's UCS for instance, and the impact it has had in little more than 2 years as a comparison.

I may be wrong, and I really hope I am wrong, but I think JNPR will need to recognize the failure and change the strategy. To me, based on what we know now, it is a failure, and a sad one. Sad, because I expected better from Juniper, and sad because healthy competition is great for every company in the market, and most important, for customers.

And I think it is a failure for various reasons:

1. Architecturally. It is very, very complicated to scale a distributed ethernet switch to the levels that Qfabric is intended to do.  And if it can be done (and having experience with distributed ethernet switches to very low level I have serious doubts about it)  it is very difficult that such thing is done while maintaining the system economic and simple to operate. It already isn't either of those two.

2. QFabric missed the industry trends. There are two key trends:

- Large L2/L3 fabrics (or simply put, networks :-) ), with  some form of edge-based overlays to deliver network virtualization. While QFabric could provide the basis for this, it fails at scale. It is too big for smaller deployments, too small for large ones. But more importantly, it has no value proposition. If I want to build a L3 fabric to then run a Distributed Edge Overlay (DEO) I can use standard OSPF or ISIS engineered networks which will support a much larger number of switches in the fabric than QFabric allows and can be potentially built using a number of vendors, and even combination of vendor equipment etc. Why would you do it with Qfabric?!

- Software Defined Networking: you can jump on the band wagon by adding marketing, claiming open APIs and whatever. Organizations looking at this have anyone one of two angles: integration into cloud stacks (i.e. OpenStack, where JNPR's Qfabric role is marginal/none), or open capabilities to manage forwarding elements from a controller (i.e. ala BigSwitch) using Open interfaces like OpenFlow. JNPR's approach today fits none. QFabric is proprietary in its architecture and does not lend itself well to allowing companies build tradicional fabrics not migrate to OF controllers as it is today.

In the end ... for those who are happy with building networks using tradicional ethernet technologies and evolving with those, QFabric does not offer any compelling value proposition. And for those facing real problems with today's state of ethernet technologies ... well ... the solution cannot be to simply build a larger ethernet switch.

But that said, I know there's great talent working on QFabric, and I really hope they will get it right, as right as it can be.

As always, opinions posted here are my own ...

Friday, February 10, 2012

Some musings for old networkers ...

Back in 1999 I was working in South America. Most service providers in the region were operating TDM and FR networks and the hot technology they were evolving to was ATM. It promised many good things, including a true converged network capable or transporting data, voice and video. Of course many at the time were not thinking of packet data (nor packet video), but actual PCM voice channels over CES.

Why am I bringing this back today? Well, because back in those days I remember having so many conversations with SPs about delivering virtual private networks using IP technology, or in fact, a nascent technology that leveraged an IP control plane: tag switching, which evolved into standard MPLS. For Cisco, this was a key differentiation in the Stratacom product line of ATM switches.

But most people in those carriers were very circuit orientated. The easiest way to setup a private network for them was to mimic the TDM world of circuits and channels ... with a PVC - or a mesh of them.

It is funny to see that, in essence, today's latest and greatest solutions for implementing virtual private networks (network virtualization) still relies on circuits in some sort of way ... IP overlays now.

The kings of the ATM SP market at the time in the region were Nortel and NewBridge. Cisco had important franchises as well. SPs were looking for ways to scale better the circuit based ATM backbones to deliver services to end users (some were thinking about going into households even, which eventually happen through the earlier phases of ADSL deployments in early 2000s).

How to build and manage all those thousands of PVCs? ATM proposed a neat way ... SVCs! VCs which were created by a software layer and at the request of the application! ... Good things were about to happen. And then it came the fierce competition ...

Nortel and many others were pushing for PNNI, a routing protocol plane, network based, which would run distributed in all ATM switches in a hierarchical topology. PNNI was essentially a routing protocol for establishing ATM VCs. Good about it? It was standard ... promised vendor interoperability and in fact there were bake-offs to prove it and so on ... Cisco was betting on PNNI and actually had a pretty robust stack implemented on its LS1010s.

NewBridge on the other hand had a radically different approach. They had a very strong management solution (the 46020) coming along from the TDM and FR ages. Essentially for them, a software running on a server would control the setup and management of PVCs and SVCs alike! ... better, it could interface through APIs with an application layer for, say, deliver IP and VoD services to ADSL consumers for instance ...

The "SDN" approach to managing ATM never worked, it failed to scale and proved to be a lock-in for customers who could never think to imagine running an ATM switch that was not NewBridge.

Of course, the end of the story I just related is well know ... ATM and PNNI slowly died, being overruled by IP and MPLS. NewBridge was acquired by Acaltel (which actually reused its 46020 for managing the 7050/7040s boxes and did well with it), and Nortel ... well ... sorry for them.

I can't help to think of this and see some analogies with recent trends and announcements in the networking industry. By no means I pretend the end could be similar though ... much have changed in the compute industry in particular to help scale a controller managing overlay networks. But it is fun to think of the irony of things, and how circuits come back to haunt us IP heads ...


Wednesday, February 8, 2012

Nicira: fear them not


So Nicira is finally out of stealth mode. This is good news. Much of what we have seen from their website now is confirming rumors and expectations. In the press however, there continues to be a bit too much hype in my opinion. Talks about the next VMWare are a bit out of plance I think, if for no other reason, because the server and networking industries are very very different (in dollar value to begin with ...).

The general assumption is that networks are very static and difficult to manage and adapt to business needs. Michael Bushong from Juniper writes that they  "are far too big and complicated to run by hand and are therefore operated by a maze of management, provisioning and OSS/BSS systems".

I guess when you are coming from a small installed base as a networking vendor in this space you want to exaggerate the issues faced by DC and Enterprise networks today. It is true that networks are not managed by hand, and are managed by OSS/BSS systems, but then isn't this a good thing anyways? And more importantly, isn't this true for server, server images, and storage as well? I wouldn't say managing hundreds or thousands of VMs with different images, patching levels, etc is a simple task that anybody want to run by hand.

But it is true that networks are static and that automating network configuration isn't an easy task. Adding ports to VLANs can be automated in a somewhat easy way. But things like stretching VLANs, or moving entire subnets around are a more difficult task. Now, a network engineer would claim that the problem isn't the network itself, but the way applications are built.

After all, if you build a network based on L3 with proper subnet planning, you will never have an issue allocating network resources for any VM you provision on the network, and all a VM needs for communication is an IP address. But the issue is that applications aren't built to run in "just any subnet", for once, they need to communicate within the subnet for many tasks with other components of the application. And then there's policy and security which if tied to the IP Address, becomes a nightmare to manage and enforce. And decoupling the policy and security rules from the IP address isn't easy to do today either.

There are many things I agree with Mr. Bushong on though, and one is that "programmability is about adding value to the network control, rather than a threat of commoditization".

Several years ago I read a paper on Microsoft's VL2 proposal. It is very similar to what Nicira is doing in concept: at the server network stack you build a tunneling mechanism that facilitates endpoint communications. At that moment I thought such approach wouldn't be feasible for it demands to change the server TCP/IP stack, a daunting task. But virtualization has changed that, because now we CAN change the stack at the vSwitch level, while the server OS, close to the application, remains unchanged. Nicira has also added one more thing: a northbound API to provisioning systems that can harmonize the network connectivity for endpoints with other resources (server, storage, etc.).

In itself, Nicira's solution isn't providing anything new: builds overlays to facilitate endpoint communication. This can be done with VXLAN  as well, what Nicira is providing is, supposedly, a control plane capable of creating and managing those overlays in an automated and scalable way. The latter point is to be confirmed of course.

Many seem to think that this will be the end of networking as we see it, and that the physical network becomes a commodity. I think this isn't true. First because building a large, scalable, fast and performing L3 network ins't rocket science but it isn't something many have succeeded at. There is a reason why Internet operators rely on just two companies for that: Cisco and Juniper.

Second because as you want to improve efficiency of utilization of your physical topology, and provide differentiation to applications that require it, your PHYSICAL network must have a way to view and interact with your overlays.

And there is more. Once you have built an architecture that enables you to create such overlays to allow endpoint connectivity, what happens when connectivity needs to be done with elements outside of your overlay? You need a gateway out, and a gateway in. I can see ways in which you leveraging OF and a controller you can scale the gateway out from the vSwitch itself, but scaling the gateway in is more difficult and chances are it will be done via appliances of some sort, which then need to be redundant etc.

So I think the more Nicira's we have the better. The more development we see that facilitates moving towards cloud architectures, the more demand for performing and intelligent networks. I do not see commoditization happening in the network for the same reason hypervisors haven't commoditize the CPUs. Intel and AMD are now building CPUs that offer more services to the hypervisor layer, in the same way, we will see networks which offer better services to the overlay networks.

Net net, I am one who thinks that much of the complexity in networking isn't created because of wrongdoings in the past, or just legacy technologies. It is because dealing with a network isn't like dealing with endpoints. It is a complex and evolving challenge in itself.

Bright times ahead for the industry …

Monday, January 23, 2012

On Soft Switching and Virtualized Networking in General (Part I)

Wow, it's been almost three months since I last wrote here. Too bad. I actually have at least four posts on the works, but I never seem to have time to finish any of them. Apart from lack of time, sometimes I am too ambitious and perhaps each post could be divided in several smaller posts.

That's what I decided to do with one of the subjects already and this will be the part I. I want to write about the challenge of networking virtual machines, and go through the various options considering pros and cons and where different solutions fit.

As always, the writing represents my opinions only, and are based on my limited knowledge of the subject. On this first part I review the challenges of networking for virtual machines and talk about what others write about it. Then on part II I will talk more about Soft Switching economics and where I think it fits and on part III I will do a similar thing about the hardware approach (VM-FEX). Finally, I am planning a part IV with details of current VM-FEX implementation on Cisco Nexus switches.



The Challenge of Networking in Virtualized Environments

Much has been written about this topic and as a networking-head fully involved with virtualization I find it so interesting that I spend quite some time reading and thinking about it. Finding the free time to write about it is another story.

There is no doubt that it is a challenge. Networking a large number of servers isn't that easy a task already, but with Virtualization you have to add the challenge of a much denser number of endpoints (the VMs) on the network, and also the mobility aspect of virtual machines to complicate matters. But scale and mobility are just two of the dimensions to this problem, with security, performance and manageability probably being the other top ones.

In general, I believe there are two approaches to solving the challenge of networking in virtualized environments: software based solutions, integrated into the hypervisor, and hardware based solutions seeking to off-load switching from the hypervisor. I think both have a place.

Martin Casado, the network heretic, has written much about why he thinks soft-switching "kicks mucho ass" and will be the winning solution. He dislikes solutions which leverage NIC virtualization (SR-IOV, 802.1BR) or those which force traffic to be pushed off of the server to have it hair-pinned if needed (802.1BR, 802.1bg). Many other bloggers have bashed tagging based solutions (i.e. 802.1BR) saying they aren't needed and they are just another attempt from hardware vendors to sell more and new hardware. It is undeniable that vendors develop technology hoping to get a profit from it, and it's quite a legitimate thing to do as well in my opinion, but I am dead sure that no vendor develops a technology to force a customer upgrade. Plain simple: if the technology isn't solving a problem, it wont sell. People isn't stupid. 

The problem we are solving (networking in highly virtualized environments) accepts multiple solutions, and it is quite different for different people. I think we can distinct two big type of organizations: IaaS service providers and Virtualized deployments in Enterprise space (i.e. Private Clouds). In the first type of customers, soft-switching may have an edge, while on the second hardware switching can be more interesting.

Casado dislikes hardware switching and I think has a good case on this for cloud service providers as I just mentioned. The best way to understand why he dislikes those hardware based options and thinks soft-switching will prevail are well explained in his blogs (which I recommend reading entirely, there's four parts), but in a nutshell he believes that software will always be feature-richer and with the performance and low price per core today the economics of soft-switching are ideal.

I disagree with him to some extent. Certainly to the extent of scoping the analysis to the entire industry. I think his analysis considers only a part of the industry, and that both options (soft and hard switching) will have its place and space to bring solutions to virtualized server networking. Let's see if I can explain why I think this way in the subsequent posts. Then I will explain the benefit of hardware based solutions and how they work. 

I must stress again that my writing on this blog is done in my free time and reflects only my own personal interests and opinions.