The interview can be found here: http://www.networkworld.com/community/blog/QA-with-jayshree-ullal-ceo-arista-networks
In there, they talk of a hypothetical network to support 5,000 servers. The number comes from a hypothetical 3MW datacenter, assuming all servers of the same kind with consumption at 600W. In a perfect world, that's 5,000 servers. They go and say this fits in 125 cabinets, so putting 40 servers per cabinet and therefore considering 24KW of power per cabinet ... Pretty agressive in conventional datacenters, but certainly possible.
So considering standard 42RU cabinets, the math goes that you need two 1RU ToR switches per cabinet, and the desired oversubscription at the first hop is 3:1. No doubt, the hypothetical scenario was chosen to obtain best fit for Arista's boxes, but even so, I think the math fails.
Before I begin, a couple considerations with this approach:
- at least in EMEA region, I have seen few designs request such low oversubscription ratios. 4:1 is very common in scenarios where local switching at the ToR isn't even required, with 8:1 being common where local switching is available. But ok, large SPs deploying dense cloud infrastructure would push for lower oversubscription.
- there are other ways to handle cabling than to put a pair of ToR switches per cabinet. With CX-1 cabling to the server, there are cables which go up to 10meters which allows for several cabinets of distance, so it isn't uncommon to see a 2RU device being used for every other rack which reduces the number of managed devices and also power consumption.
Why the math is misleading
Arista considers using a 7050-64 (64x10GE, 125Watts), which is good enough for 40 ports to each of the servers and 16 10GE uplinks, leaving some spare ports unused. The ToR devices will be aggregated on Arista's 7500. With 16 uplinks per ToR, 2 ToRs per cabinet and 125 cabinets, you need 4,000 10GE aggregation ports in the upper layer. The statement goes you need 12 Arista 7500 series at the aggregation layer and 250 Arista 7050s ToR switches.. A quote is given of 75.6kw worth of power: the main variable here.
(I would again highlight that few organizations deploy a 4,000 10GE aggregation layer to provide connectivity to 5,000 servers today ...)
How are ToR devices connected to the upper layer 12 devices? This isn't mentioned. The Arista 7500 claims to support 384 10GE ports. On a two spine design, you can therefore support 48 ToR switches. On a four spine design, 96 ToR switches. On a eight spine design, 192 switches. A 12 spine design doesn't add up to 16 uplinks to keep an even topology ...
Another thing isn't mentioned: how do they keep redundancy an multi-pathing? M-LAG currently works across two spines ... No TRILL or equivalent yet. Where do they do L3?
Finally: 5,000 servers means a minimum of 5,000 Mac addresses and IP addresses. In fact, more, because you will have at least also a management interface. If you run VMs on each server, then each VM is a minimum of one mac and one IP. As per current datasheet the Arista 7500 supports only 16K mac addresses and 8K ARP entries. At the first hop router you must be able to keep one ARP entry per IP in the subnets. If each server has one IP for management one for traffic (again, will be more), that's already 10K ARP entries ...
So, how does this "green" network that Arista depicts work at all?! Can't keep all information in the forwarding table, and can't manage redundancy for so many paths ...
If we do the math reversed, with 10 VMs per server, two vNICs per VM, you need at least 150K ARP entries and mac address entires.
With current Arista numbers, best you can do is around 800 servers.
Why the Arista reported Cisco math is also misleading ...
For the Cisco example, they pick on a design with a Nexus 5548. This obviates that we also have a 1 RU box with 64 10GE ports: the Nexus 3064. Had they chosen that box, we would use the same number of devices at the ToR, and produce similar power consumption at the edge calculation (which is the bulk of the power number).
But we could also say that you can use the Nexus 5596, which gives us greater density. Use 60 server facing ports and 20 uplinks and rack them every other rack. This means we only need 84 devices at the access layer, which further reduces power consumption and management complexity.
For redundancy between the spine and access layer we could propose FabricPath, but this isn't yet available on the Nexus 5500 (although it will be in the months to come). This also scales better in terms of mac address table explosion because it uses conditional mac learning. And the Nexus 7000 also scales way better in the number of L3 entries, capable of storing up to 1M L3 entries.
So if we do the math by building a network that actually works ... Arista's would stop at 800 servers and Cisco's could indeed scale at 5,000 servers.