Translate

Saturday, August 27, 2011

An example of poor network design, or how to look for trouble

Everybody these days is trying to do more with less. This is fine, but the challenge is that sometimes you end up doing less with more. I have recently found a paper from a networking vendor with a recommendation which falls into that category and I wanted to write about it.

Companies are looking to upgrade their infrastructure from GE to 10GE, and many are considering whether to upgrade existing modular switches or deploy new ones. There are many variables to consider here but, clearly, newer switches are denser in terms of 10GE (as one could expect of the natural evolution of technology). Two platforms in particular, the Cisco Catalyst 4500 and 6500 have demonstrated impressive evolution over time. I've know many customers using the latter in Data Center environments and when I look back five years ago, I am sure those customers can recognize that they did the right investment in the Catalyst 6500. I believe NONE of the high end modular switches which the Catalyst 6500 was competing with five years ago is still a valid option in the markeplace. Customers who chose to go with Nortel Passports, Foundry BigIron or MLXs, Force 10 ... would find themselves today with platforms which have had no future for already a couple of years, very limited upgrades, and poor support. On the other hand, the Catalyst 6500 still offers bleeding edge features and options for software and hardware upgrades to enhance performance.

But it is clear that there are way denser switches for DC 10GE deployments, starting with the Cisco Nexus 7000 of course. Customers need to evaluate what is best for them, and each case is different.

An idea which comes to some, and is recommended by at least one network vendor as I wrote earlier, is to front end existing switches (which are less dense in 10GE port count) with low-cost 10GE switches to provide a low-cost high density 10GE fan-out. The follow picture shows this "design" approach:



In my opinion this is a bad idea, very bad network design, and it is looking for trouble. Moreover, I think this is an approach which may end up being "doing-less-with-more", even if at first glance, may look "cheap" to build.  In this post I will try to explain the reasons why I think this way.

Technical Reasons

Multicast Performance - Switches constrain multicast flooding by implementing IGMP snooping. Shortly: as multicast receivers send IGMP join messages to signal they want to join a group, the switch's control plane will snoop the traffic and program hardware installing an entry (hopefully for the S,G information) which points to the port on which the IGMP join was received. Without IGMP v3 support, upon receiving a leave message, the switch in the middle must forward the message upstream but only after it has sent a message downstream to ensure no more receivers are left on the downstream switch. All in all, this adds complication, potential points of control plane failure, and inevitable latency on the join and leave processes. Means: a receiver will take longer to get multicast traffic after requesting, and the network will take longer to prune multicast traffic when the receiver leaves a group.

Multicast Scalability - This is a bigger issue than the former one. Networks built with Catalyst 6500 modular switches can scale up to 32K multicast entries in hardware. Typical ToR switches support between 1K and 2K entries  {CHECK FOR ARISTA}. These figures are ok when a switch with 48 ports connects to 48 servers. But if you will use that switch to connect to 48 access switches each with 48 servers, then you are looking to provide connectivity for in excess of 2,000 devices. The math is clear in how much you are limiting yourself with this design.

Buffering - Fixed form factor switches are designed typically to connect end-points. A ToR is designed to connect servers. The buffer available in those switches is then suited for such application, and expects little to no contention on server-facing ports, and the only contention point to be on the uplinks for practical traffic purposes. When a 48-port ToR switch is used in access, it deals with the traffic and burstiness of 48 servers or less (duh!) ... but if you use it to aggregate 48 access switches each with 48 ports, then it aggregates the traffic of 2,000+ servers. It is VERY obvious that you'll end up in trouble, silent drops, without much more analysis.

The latter is probably the most serious limitation of this design, because it will affect all possible types of deployments. There are others, such as how to deal with QoS, LACP or spanning-tree compatibility, impact on convergence on failure scenarios ...

Operational Reasons

Software Upgrades - Since you are adding one more layer to manage, you now have more devices to upgrade, potentially with different software and upgrade procedures, and you need to evaluate the impact of upgrades on convergence. Apart from the Cisco Nexus 5000 series, no ToR in the market provides ISSU support.

Provisioning - Setting up vlans, policy and many network tasks get more complicated because you need to provision them on more devices. If spanning-tree needs to run (and this is recommended on L2 networks using multi-chassis etherchannel as a safe-guard mechanism) you also need to consider that ToR switches may have support for limited number of spanning-tree instances.

Support - Troubleshooting gets more complicated. Instrumentation will likely be different between ToR switches and high end modular devices. IF the ToR is from a different vendor of the rest of the network, this gets even more complicated because the technical support services from two or more vendors may need to be involved to deal with complex troubleshooting issues.

The list could go on ... I sure hope customers consider things carefully. Something which is cheap today and may "just work" today, could be extremely expensive in the long run, or simply no loger work for (near) future requirements.


I believe that in most cases poor network design comes as a result of lack of knowledge. Many people still think that to build networks you just need switches and ports. So all the count is how many ports you need, how many switches, off you go ...

Lack of knowledge is not a bad thing (I mean, nobody is born knowing stuff), and can be solved with reading, training, etc. Now, when the poor network design comes from a networking vendor document you have to ask yourself how much you can trust them ... for THEY should not lack the knowledge to do things right in networking.

No comments:

Post a Comment