Translate

Wednesday, January 7, 2015

Let's talk about a 64 Tbps Firewall

Much has been written about SDN, ACI and NSX. The debates are heated, often times perceived as if enemies are fighting one another. In the end, we are just writing about technology, and we should all remember that. Of course we all defend what we think is best, and of course we all try to bring forward considerations that are better suited to whatever position we are defending. This is human. 

I certainly do not want to contribute to the perception that it's all about bashing one solution or another. Certainly, I do not want to bash any solution, and NSX in particular. But I believe we all need to contribute somehow to have a more moderate debate, and I just dislike exaggeration. With all my respect for @bradhedlund, tweets like the one below fall into exaggeration, in my humble opinion:


To put some context, this tweet comes out of a discussion over a blog post by @cobendien and @chadh0517 [which you can read here]. The post is about how open is Cisco ACI and what is the Total Cost of Ownership (TCO). A comparison is provided in terms of TCO, presenting a Cisco ACI design and another one built with NSX and Arista commodity switches. Brad's argument is that the TCO analysis is flawed, because NSX provides E/W firewall which should be added to the ACI design. Brad is talking about using NSX DFW to provide for an environment of up to 96,000 virtual machines (which is the size of the design that is presented in the original post). 

The logic here goes that to compare ACI and NSX costs, you need to add an enormous amount of hardware to provide for firewall and load balancing capabilities that NSX provides. NSX Load Balancing however appears to be nothing more than HA Proxy with a front end for provisioning. HA Proxy is, as we all know, free. A device package for HA Proxy is easy to do in order to integrate it with ACI. Similarly, NSX Edge firewall capabilities are very similar to what you can get on a Linux VM using IP Tables. The key argument therefore lies on the Distributed Firewall feature, which provides for East-West stateful packet filtering on a per vNIC level.

I am certain that many customers see value in NSX, and specifically on the Distributed Firewall feature. However, based on my understanding of NSX, it is unfair (and not true) to pretend that 3,200 hosts running NSX are equal to a 64 TB firewall. This isn't true in essence, nor in math. In my opinion, that is an exaggeration and I believe it should be avoided, and I have seen similar ones many times.

I will try to provide details about why I think this way, based of course on my humble knowledge of NSX as it is today. I do hope I'll be corrected if and where I am wrong. Comments are opened, and welcomed!

The need for East/West Filtering



I do not contest that some customers have a clear interest, and in some cases a real need, to have per VM East-West traffic filtering. I do contest that per VM East-West traffic filtering is a requirement on every case, and/or that the best implementation in order to accomplish that is by running it at the hypervisor kernel level. 

In first place, it is fairly obvious that not everything runs in a VM. Many (many) apps still run in bare metal, therefore a security solution should consider East-West filtering to be comprehensive for both environments. Second, implementing advanced traffic filtering at the hypervisor level complicates having a seamless solution for policy across multiple hypervisors, because the data plane would need to be replicated on every kernel, something which isn't evident to accomplish (even if most vendors today have an open API for accessing the kernel for network functions). Third, the "cost" of filtering at every virtual ingress point may be higher than it appears, and it isn't for free in any case. Fourth, and a direct consequence of the previous point, implementing advanced filtering isn't available today at very high speed on x86 architectures. 

This leads me to the tweet line below, part of the same tweet exchange from above:




What a Firewall is, and what a Firewall isn't 



I am not a security expert, but I believe that the definition of a Firewall isn't a matter of "opinion". It is a matter of the functionality provided. That said, not every place in the network needs advanced packet analysis and filtering, so the functionality required from a firewall isn't always the same, and that's the real story. 

One of the fundamental arguments for proposing a DFW like that provided by NSX (i.e. a stateful packet filtering engine at the vNIC level) is that it prevents threats from propagating East-West. Explained shortly: if you have a vulnerability exploited on a VM on a particular application tier, the exploited VM cannot be (easily) used to launch attacks against others in the same tier and/or other tiers of VMs.

But I think that this argument contains a bit of a marketing exaggeration. 

Imagine a typical three tier application. At the user facing side, the application tier presenting a web front end with a number of VMs running Apache, nginx, IIS … your pick. All those VMs listen on tcp/80, any of those will eventually have vulnerabilities. If one of those VMs is exploited through a vulnerability of the web server accessible on that port, the chances that ALL other VMs on the same tier are equally exploited are very high. This is the reason why your front end perimeter firewall should have capabilities to inspect HTTP traffic in order to protect the web server. For this you need a real firewall that can do HTTP inspection, HTML analysis, etc. The fact that once one Web VM exploited, your DFW impedes each Web VM from talking to other Web VM does very little in this sense, because the attack vector was on the legitimate port. Then, even if those exploited web VMs can only communicate on authorised protocol and ports with other application tiers (something which is also enforced by the ACI fabric since each tier will typically be on different EPG - or EPGs and can communicate only on a contract permitted basis). But we all know that again this protection is also not incredibly strong because, again, it is only filtering on protocols and ports. The real exploit may (will) happen through the legit port and protocol. This is the reason why modern firewalls (aka NGFW) these days inspect traffic without necessarily concerning themselves about port and protocol. To continue our example, SQL traffic may be allowed between the App and DB tiers, but you need the capacity of inspecting inside the SQL conversations to detect malicious attempts to exploit a bug on a particular SQL implementation … something which a NGFW does (but the ACI fabric filtering or NSX FW don't do). 

This does not mean that the ACI or NSX DFW filtering aren't important and/or useful. I am not writing this as a means to discredit the NSX DFW feature, which may be interesting in terms of compliance or for other reasons. Multi-tenancy may be a reason for using such filtering, other than security. ACI offers very similar functionality, being a natural stateless networking fabric it may appeal to security ops as well. 

This is truly not about bashing a technology, but I do want to put the reader into a mood where they think about these tools as parts of a solution: not as the holy grail (which they are not). I hope the reader will ask her/himself whether a feature is really valuable, or whether it simply is presented in such a way because it is … well … the only thing that a particular vendor has to offer. 

In reality, if security is really of top concern within the DC, you will likely still need a real firewall (from a real firewall vendor), and what matters in that case is how service redirection can happen in an effective and consistent manner (for physical, and virtual). 


3200 Hosts and a 64Tbps Firewall


There is a bigger point of exaggeration on the tweet line above. It's not only the hype about DFW capabilities and how much security can it really enforce. The other point that I think falls into both hype and exaggeration on the tweet exchange above is performance and scale. I'd like to spend some time on that to put things into perspective. 

I want to clarify that this isn't about "software can't scale, and hardware is better" or vice versa. It is about the fact that oversimplifying things isn't helpful for anybody, and that distributed performance is sometimes presented almost in mythical ways.

Let us analyse an environment with up to 96,000 VMs, on 3200 hosts each dual homed with 20Gbps. Such an infrastructure  cannot be considered in terms of a SINGLE FEATURE of the underlying software or hardware. There are many more dimensions to consider. And then also even if considering one single feature, the devil is in the details. 

Let us consider the following:

(1) Such an infrastructure needs not only E/W filtering, but N/S as well. Perimeter firewalling is what will first and foremost protect that front of end of yours. You will typically want to have a NGFW for this. Depending on the design, the NGFW resources may be virtualised and therefore be shared for both functions (perimeter and east/west). In the case of using NSX, the resources allocated for the DFW are only available for basic E/W filtering. No sharing in that case. This point must be considered also in the TCO.

(2) Following from above, the DFW feature isn't free to run. The tweet line above was presented as if NSX was adding that feature on top of a design at no additional cost. But the real cost of the DFW feature, or any other feature that runs on a hypervisor, cannot be evaluated only in terms of the software licensing, but also the general purpose cpu cores and memory that are consumed by this feature. Cores and memory that aren't available for other applications. Nothing is for free.

(3) To infer that because you have 3,200 hosts each with 20Gbps you have a 64 Tbps firewall is marketing math at its best. First, because to this date, we have not seen any independent validation (or vendor validation for that matter) of the real performance of the DFW (or any other NSX feature). This is a clear contrast with VMware vSphere features, or with VMware VSAN where there's a plethora of vendor and third party benchmark testing done publicly available. Second, because with current NSX scalability limits (as per data sheet), those 3,200 hosts cannot all be part of a single NSX domain, therefore need to be split into smaller Distributed Firewalls with no policy synchronisation. More details on this below.

(4) Understanding that real security (beyond protocol and port filtering) will require a real NGFW filtering at least between critical Apps and/or App tiers, the NSX solution must also be complemented with an offering from a company like Palo Alto (and to this date, only that company as far as I know). This has an impact in performance, in cost, and in overall resources (because each PA VM-Series also requires vCPUs and vRAM which are incremental to those consumed by the DFW feature itself).

(5) Traffic must be routed in and out of that virtual environment. Those 96K VMs will have a need to communicate with users if nothing else, and with their storage in most cases too. In fact, if the server has two 10GE interfaces it is likely that a fair amount of that bandwidth will be dedicated to storage that won't be protected by the DFW feature in any case. This also means that you do not need 20 Gbps of firewall throughput per server. The amount of North South traffic will largely depend on what the infrastructure is being used for: what is running on those 96K VMs. Imagine that you need to onboard lots of data for a virtual Hadoop Cluster that you decide to run there, you may have significant peaks of traffic inbound. Assume North/South is 10% of total bandwidth, you then need in excess of 6Tbps of routed traffic in/out of the overlay … I will come back to this point later. 

To expand a bit on point (3) above, on the per-host performance, it is well known however that ESXi in general does not do 10GE at line rate for small packet sizes for instance (http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-VXLAN-Perf.pdf). This continues getting better and better of course, and I'd expect vSphere 6.0 will also improve in this area. The NSX performance figures that I have seen presented at VMworld SFO 2014 [Session NET1883] displayed results that for small packet sizes, DFW was maxing out at around 14Gbps on a test with 32 TCP flows on one direction, whilst for large packet sizes it could be up to 18Gbps per host. Since this test was done with unidirectional traffic only and with a (very) small number of flows (and not knowing how many cores where dedicated to achieving this performance because such information wasn't shared), we cannot infer what the real performance is for IMIX or EMIX, but it is clear that it will not be 20Gbps per server. 

The performance considerations of a distributed environment can be looked at in many ways. Distributed performance may scale better in certain cases, and worse in others. The performance of each point of distribution needs to be considered. In this case, there are per-host limitations which may or may not be acceptable. Again, leaving behind whether current NSX DFW performance is 14 or 18 Gbps per host or whatever (in any case, it will get better with time for sure), as you add NGFW capabilities today this is lowered by an order of magnitude (even if that will also get better over time). The PA VM-Series maxes at 1 Gbps per VM as per their data sheet

Oh, but you only send to the Palo Alto VM-Series the traffic that requires deep inspection … by filtering on port and protocol with the DFW? … This is a possibility, but then you are saying that you won't be sending to deep packet inspection such port and protocol because they are safe? … That argument doesn't hold. And particularly the Palo Alto folks know that, since they make a selling point around … well … being a firewall that cannot be fooled by port and protocol. Anyways, the level of security requirements will be customer dependent of course, not all environments are the same. 

Back to performance, arguably you can also say that any particular host won't be bursting at maximum performance all the time. So it does not matter if you can do only 1-2 Gbps, because maybe that is all that your application requires. But it is clear that when an application is busy on a particular host, likely traffic is high on that particular host as well, and you do need the firewall performance for that particular host at that very moment. 

In other words, a 3,200 host NSX environment isn't really a 64Tbps DFW at all. If anything, it'd be 10 or 12 smaller ones since you cannot have 96K VM under a single NSX Manager (see below where the 10-12 comes from). 

Of course, the same argument about statistical traffic requirements (i.e. all host won't require maximum performance at once) can be made for a semi-distributed NGFW design option. By semi-distributed I mean a scale out model built with appliances and dynamic service chaining. This isn't an all or nothing. You can distribute firewall functions to every host, but the alternative isn't to have a single large, monolithic firewall. You can have a scale-out cluster of dedicated hardware appliances too. 

For the sake of argument, consider using Palo Alto physical appliances on a scale out model, vs. per host Palo Alto VM-Series integrated with NSX. A vSphere cluster of 32 hosts could max out at 1 Gbps of firewall throughput per host (as per the Palo Alto VM-Series data sheet here). You could say that the vSphere NSX cluster is a 32 Gbps NGFW for E/W, assuming you use it to filter all traffic (which is probably not required in many environments). To accomplish this you need the VM-Series licenses, the NSX licenses, and dedicated physical hosts for running NSX controllers and manager. Also, you are dedicating a total of 256 cores on that vSphere cluster to Palo Alto alone (or the equivalent of 5-12 servers depending on CPU configuration on a dual socket server). This is not counting the cores required for DFW, which are harder to estimate. On such solution, no single VM on any particular host can exceed the maximum throughput of the VM-Series appliance (1 Gbps today). That is: traffic between any two VMs crossing the VM-Series can never reach 10Gbps. 

Consider now a design with Palo Alto physical appliances instead. If you would use two PA-5060s with service chaining, you have 40 Gbps if NGFW capacity for East/West. You also get better performance for any single VM to VM communication which can now fully use all host available bandwidth. More over, that NGFW serves not only ESXi hosts … but any other stuff you have connected to the fabric! … 

Now run the numbers for TCO of those two environments ... the numbers do not lie. I do not know the pricing of Palo Alto appliances or VM-Series, so I will let the reader contact their favourite reseller. Of course I can make the same case for ASA and compare (so I know the results :-) ) but I do not want this post to be a Cisco selling speech.



However by using NSX + Palo Alto, you get to use vCenter attributes to configure policy! and you can dynamically change the security settings for VMs as required. Well, actually, that would be possible without NSX as well from a policy point of view because nothing technically prevents Panorama from directly interfacing with vCenter and the latter from communicating the mapping of VM attributes with IP/Mac, etc. The solution has been productised in a way that Panorama must talk to NSX Manager, but I see no (technical) reason why it couldn't be done in other ways since it is vCenter who holds the key information.


Now let's come back to point (5) from above. To do that, we need to consider the design of the 96K VM infrastructure with greater detail. The design possibilities for such environment are so varied that I won't take a shot at doing this here. But I do want to point out some things for consideration, for instance, let's review a possible fabric design first, with a variation of what was proposed above:

- 3,200 hosts could fit in as little as 80 racks, using 1 RU dual socket servers (provided cooling can be done in the facilities)
- using redundant ToRs per rack, this translates into 160 ToRs. We would consider four spines.
- if using ACI, the fabric would have three APIC controllers in a cluster.

Now let's consider the virtual infrastructure:

(1) vCenter 5.5 maxes out 10,000 active VMs [http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-maximums.pdf]. I do not know if customers would feel comfortable operating the environment at such high utilisation numbers, considering that vCenter is also a single point of failure. Also, if operating at maximum possible scale, the vCenter DB also must be considered carefully. For the sake of an exercise, let us assume you add 20% safety so you operate at 8,000 VMs per vCenter, you would need 12 vCenter. To make an easier math, let's assume 9,600 VM per vCenter and therefore you need  a total of 10 vCenter instances.

(2) The point above is important because NSX as of v6.1 has a mapping of 1:1 between the NSX Manager, controller cluster and vCenter. This means that you also need 10 different controller clusters and 10 different NSX Managers. Notice that there is no federation of controllers or managers today. This means that from a DFW standpoint (and from every other standpoint in fact), you have to replicate your policies 10 times (or more if you choose to use less VMs per vCenter). This is why I alluded above that if we are to believe Brad's marketing claim of 3,200 hosts being a 64 TB firewall, it should as a minimum be split into 10 chunks (and likely into more chunks in fact for practical purposes). 

(3) We are considering here that each controller cluster and NSX Manager can handle the load for up to 9,600 VMs over 320 physical hosts. We have not talked of how many Logical Switches, Distributed Logical Routers, etc. may be required on the environment, or the impact that such considerations may have on overall scalability of NSX. Again, an area where we all live in the grey mist of ignorance, since no public guidelines are available. But it is clear that if we choose 10 vCenters, the controller clusters will be operating near its maximum advertised capabilities (which is 10,000 VM) in any case. With this in mind if you design for maximum availability and minimal performance compromises, you need:

- 10 servers for vCenter
- 30 servers for NSX Controllers
- 10 servers for NSX Managers

The above must of course pay vSphere licenses as well, and would take up almost two racks worth of space, network connectivity and power. Arguably, you can say that you can run two or even three controllers per server, each belonging to a different cluster. This has an impact in the system availabily, because in such a case a single host failure (or maintenance) would impact two or three clusters, as opposed to only one (so you duplicate or triplicate - or more - the size of your failure domain). If you choose for up to three controllers per host, you'd use 10 servers for controllers, instead of 30, at the expense of a larger failure domain. In any case I doubt anybody would design to operate the environment at 96% of advertised capacity for vCenter nor NSX, so the actual figure will likely be larger. (although vSphere 6.0 is coming out soon and may raise some of these figures up).

It is worth remarking that, effectively, you have 10 isolated islands of connectivity on that infrastructure. How does a VM on one island talk to a VM on another island? … Through an NSX Gateway. So the East/West across islands requires Gateway function resources. Another point to consider in the 64 Tbps DFW marketecture which was ignored (and another cost).


Now comes the perimeter. The North South. Again, let's say you design for 10% of the deployed fabric capacity. No, let's say it's 5%. You want 3 Tbps of North/South capacity available. Let's forget the perimeter firewall because you could argue that whether you use ACI or NSX or both, you need it anyways and it could be made off of the same clusters of ASA or Palo's or whatever. But you have to route those subnets in which your 96K VMs live. For NSX, that means using NSX Edge, peering with the DLR. How many DLRs you need? Depends: on how many tenants you have, on how many subnets you need to isolate and route, etc. Let's forget those too. At VMworld session NET1883 they presented a test case where NSX Edge could route at about 7 Gbps. Again a test with minimal data shared: no RFC level testing, no IMIX consideration, no word on tolerated loss rates, or latency, … 

Let's say each NSX Edge routes 10 Gbps though, and you can load balance perfectly across several instances as required.  To route 3 Tbps in and out of that overlay you need 300 NSX Edge VMs. If they use any stateful services (i.e. NAT) and if you want redundancy? … then you need to add 300 more … in standby. 

Now, let's put two NSX Edge per physical server (each with a dedicated 10GE NIC), you are adding 300 servers to the mix. That is almost 8 racks worth of gear that pays full vSphere (and NSX) licenses that can do one thing alone: run NSX Edge. Let's forget the complexity of operating 300 mini-routers, each of them independent of one another. 

In the drawing above I show how NSX Edge is also required if East/West traffic flows between NSX Domains, but notice that I have not considered any requirement for E/W between the different NSX domains on the calculations above about how many NSX Edge VM you need. But that would add more need for NSX Edge VMs. We have also not considered the DLR VMs, that would also require dedicated servers.

Now imagine an upgrade! ...


And what about ACI?


With ACI the fabric can be shared with more than one vCenter. Also, gateway functions are not required to connect VMs with other stuff, whether it is users through the core or WAN, or bare metal applications. As for East/West, it really depends on every environment. Where the intention is to simply have filtering to segregate tenants or apps, it is possible that ACI stateless filtering model intrinsic to the Application Profile definition is sufficient. In other cases the filtering is required at the vSwitch level, and in that case the Application Virtual Switch (AVS) can be leveraged in vSphere environments. Or one could imagine that NSX is also leveraged in that sense, with ACI for service chaining. We have seen that using physical firewall appliances in a scale-out model with dynamic service chaining can be more cost effective and render better performance than the virtual model. The graph below illustrates a possible alternative, built using vCenter with ACI but without NSX.




Because you do not need the NSX Edge Gateway functions and you need less servers for management and control, you will see savings that can probably pay for the NGFW functions. This depends on each case and on the scale and requirements of course. But you can see that for this particular scenario, you probably require between 200-300 less servers, with accompanying licenses, space and power costs.


In the end, it's not all black and white


To wrap it up, this isn't about saying that NSX is a bad product. This isn't about saying that the NSX DFW feature is good or bad. It is about expanding the conversation and considerations beyond hype, and marketing. Fact is, many customers that are looking at both ACI or NSX have scalability requirements well below the design discussed here. Ultimately, customers may see value in both solutions as well, or in none of them! For instance using NSX DFW as a form of advanced PVLANs, but use ACI fabric service chaining for redirecting to NGFW running in scale out clusters. 

The important thing IMHO is that customers can make an educated decision. And while all vendors legitimately try to steer the conversations to their advantage, we should all try to avoid falling into blatant exaggerations. Particularly, when it touches security aspects.