Translate

Monday, July 11, 2011

One layer, two layers … how many do your really need?

I do not know how many times over the last few months I have had to deal with the "we want a network with less layers, one layer is the ideal" comment. In fact it is a recurring question. The number of customers that over the past 10 years has asked me why do we need  a core, or whether they can collapse aggregation and access is significant.

In the end, it comes down to the perception that by collapsing you may save money, on smaller deployments, and to simplify the network, on larger ones. I think that this perception is often wrong, and the truth is these days several networking vendors are trying to exploit it in a misleading way. 

Why and when do you need more than one layer?

I will look at this from a data center perspective, so I will consider that end-points connecting to the network are primarily servers. But the analysis is similar for campus, with a different set of endpoints (desktops, laptops, wireless access points, printers, badge readers, etc.) which have very different bandwidth & security requirements.

If the number of servers you have fits in a single switch, then you can do with a one layer approach [I know, some people may read this and be thinking "some solutions out there convert the network in a single switch" … I'll get to those in other blog posts too].

So from a physical point of view this is obvious: if you can connect all your servers to one single switch, you need one layer. Now you want to have redundancy, so you will want the servers to connect each to at least two switches. So you have two switches which then connect to the rest of your network.

If you have 40 servers with just two NIC per server, you can probably do with a pair of 48-port switches. What if you have 400 instead of 40? Then you use a switch with +400-ports and you are done. But well, then you realize that you have to put that big switch somehow far from your servers and you spend lots of money on fiber cables and also complicate operations.

What if the switch needs to be replaced/upgraded? You need to impact ALL your servers ... And also, what if I need 4,000 servers now instead of 400?

So you say it is best to use smaller switches and place them closer to your servers. Top of Rack is where most people put them. Why is this good?
- can use less fiber between racks
- can use copper within the rack (cheaper)
- simplifies (physical) operations [i.e. when you need to upgrade/replace the switch you impact less devices]

So most people would connect servers to a pair of smaller switches placed at the Top of the Rack (ToR). Now you have to interconnect those ToR switches together. So we need a 2-layer network. The switch that we use to interconnect the ToR switches would be called a "distribution" or "aggregation" layer switch. Because you want redundancy, you would put two of those.


Pretty basic so far. In practical terms, there is almost no way to get away of a 2-layer network.

Ok, but more than 2 layers is really an overkill! This is just vendors trying to rip money off of people!

Same principle applies to begin with. A pair of distribution switches would have a limited number of ports. Let's say for simplification that you use two uplinks per ToR switch, one to each distribution switch. If you have 40 ToR switches you MAY do with say a 48-port switch for distribution [notice the MAY in uppercase, for there are considerations other than the number of switchports here].

Say you have 400 ToR devices now ... you need a pair of +400-port switches in distribution layer. Say you have 4,000 ToR devices ... Not so easy to think of a single device with 4,000 ports. But the key point here now is say you have 40,000, or more ... How do you make sure that whatever the number, you know what to do and can grow the network.

This is where a third layer and a hierarchical modular design MAY come in place [notice the upper case on MAY, for there are other options here, discussed below]. By adding another layer, you multiply the scalability. Now you can design PODs of distribution pairs with a known capacity (based on the number of ports and other parameters like size of forwarding tables, scalability of control plane, etc) which you can interconnect with one another through a core layer.



So, do you need a core layer? Depends on the size of your network. But the point is that regardless of what any vendor may claim today, as the network grows above the physical limits of the devices you use to aggregate access switches, you need to add another layer to make the network grow in a scalable way.


OK, but I can also grow with a 2-layer design ...

Yes, and this is becoming more and more popular. In such architecture the ToR switch is typically defined as a 'leaf' switch, which connects to a 'spine' switch (spine would be similar to the distribution switch). You can have multiple 'spine' switches, so you can scale the network to a higher bandwidth for each ToR. If a ToR requires more than two uplinks, you can add more spine switches to provide more bandwidth.

But the size of the network measured by the number of leaf/ToR switches is still limited by the maximum port density of the spine/distribution switch. When you exceed that number you are bound to use another layer.

More reasons to consider multiple layers ...

So far we have seen the obvious physical reasons calling for 2 or 3 layer design. Needless to say that as platforms become denser in number of supported ports with each product generation, what before required 3 layers may be later accomplish in 2. But at scale, you always need to fall into multiple physical layers.

But there are other reasons why you want to use multiple layers, and those are operational more than anything. If you have all your servers connected to a pair of switches, or for that matter all your ToRs connected to a pair of switches which ALSO connect to the rest of the network, maintenance, replacements, and upgrades on those switches may have a big impact on the overall datacenter.

So in practical terms, it may be interesting to reduce the size of a distribution pair even if that requires adding a third layer, so that you reduce the failure domain. What I mean here is that let's say you have 4,000 servers and you can hook them all off of a single pair of distribution switches. It may be interesting to think of doing two PODs of 2,000 each even if that requires using double the number of distribution switches. Why? A serious problem in your pair of switches will not bring down all your computing capacity.

Of course people would argue "hey, I already have TWO distribution switches for THAT reason". Fair enough. But if you goof an update or you are impacted by a software bug that is data-plane related (i.e. related to traffic through the switches) there is a chance that you will affect both. This is rare, but happens.

The 2-layer approach already shields you in that level for when you need to upgrade your access switches, say to deploy a new feature or correct a software bug, you can test on a sandbox switch first, and then deploy the upgrade serially so you make sure there is no impact to the network.

There are other reasons to have a hierarchical network (with 2 or 3 layers depending on size), such as network protocol scaling by allowing aggregation (again, comes to minimize size of failure domain, this time on L3 control plane), applying policy in hierarchical way (allows scaling hardware reasources) and others, but I am not touching on any of these on purpose. The reason is that there are some that now would dispute that if you do NOT rely on network control protocols you get away with their limitations. This is certainly one of the ideas behind most people working on SDN and OpenFlow, as well as on some vendor proprietary implementations.

What I wanted to do with this post is prove that regardless of the network control plane approach (standards based with distributed control - OSPF, ISIS, TRILL -, standards based with centralized control - i.e. OpenFlow -, or proprietary implementations - i.e. BrocadeOne or JNPR's QFabric), at scale you ALWAYS need more than one physical network layer, and in large scale datacenters it is almost impossible to move away from three.



No comments:

Post a Comment