There are a lot of SSL offload throughput statistics available for appliances across the internet but rarely do they detail the way they were tested (probably because a lot of the numbers are inflated for marketing purposes). We at Loadbalancer.org would like to improve the standard across the industry by being transparent about how exactly we have tested our appliances for SSL performance:

What is SSL offloading/SSL Termination?

SSL offloading is the process of moving SSL traffic decryption and encryption away from your web servers onto a centralised device, be it a load balancer or specific SSL offloading hardware.

Why is SSL offloading/SSL Termination on the load balancer necessary?

Well its not really.... In fact Loadbalancer.org has always recommended that you use the application cluster for horizontally scaling your SSL. However SSL termination on the load balancer is definitely required if you need to use load balancer based cookie persistence instead of just source IP persistence + a few other things that you really should not be doing like directing packets based on contents of URLs etc. In fact SSL offloading ALWAYS doubles your load / halves your speed, after all you are going to re-encrypt to the backend aren't you? PCI-DSS compliance and all that?

The Test

You can use as many clients and back end servers as you need, to get the best results the loadbalancer/appliance should be the bottleneck. To generate the load on the test devices we wrote our own SSL test script in python. If you want a copy of it fire off an email to support@loadbalancer.org and they will be able to provide you with the latest version.

One IP address listening on port 443 using the decryption system of your choice, in our case its Stunnel. No SSL session reuse, although in the real world its a useful resource and should be implemented where possible. In this case it wont really give you a clear view of the raw horse power of the appliance. So each connection should involve a full SSL handshake and be fully closed on completion.

Then 1 internal IP listening on port 80 forwarding to HAProxy which is configured with no persistence. select a real server to direct the connection to based on weighted least connections. The backend servers should be running a web service either apache or nginx (in our case its apache) and returning a single html formatted webpage.
For Example -

<html>
<head>
<title>Server rip-test-1</title>
</head>
<body>
<p>You are viewing rip-test-1</p>
</body>
</html>

The whole page is 107B in size, this page is duplicated across all of the backend servers, with the only difference being rip-test-1 is changed to reflect the server name so we have

  • rip-test-1
  • rip-test-2
  • rip-test-3
  • etc.

Each test was run for a total of 60 seconds for a total of 3 runs, using HAProxy 1.5-dev18 version, Stunnel version 4.55 and OpenSSL 1.0.1e.
A complete connection counts as a HTTPS request for the page made from a client NOT running on the loadbalancer to the 443 IP on the loadbalancer, that request is decoded and passed on to HAProxy where it decides which server to pass the request on to. The request is received by the back-end and the page is returned to HAProxy where is passed onto the program that's doing the SSL to be re-encrypted and passed back to the client.

SSL (TPS) Terminations Per Second results for a selection of CPUs

Date of Test CPU RAM Cipher Certificate Length Run 1 Results Run 2 Results Run 3 Results
30/10/13 Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz 4GB ECDHE-RSA-AES256-SHA 1024 bits 3625 3629 3631
31/10/13 Intel® Atom processor C2750 8GB ECDHE-RSA-AES256-SHA 1024 bits 1147 1204 1220
04/11/13 Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz 8GB ECDHE-RSA-AES256-SHA 1024 bits 2780 2778 2777
04/11/2013 Intel(R) Celeron(R) CPU 440 @ 2.00GHz 1GB ECDHE-RSA-AES256-SHA 1024 bits 321 322 322
06/11/13 Intel(R) Xeon(R) CPU X3430 @ 2.40GHz 2GB ECDHE-RSA-AES256-SHA 1024 bits 2160 2234 2236
06/11/13 Intel(R) Atom(TM) CPU D510 @ 1.66GHz 4GB ECDHE-RSA-AES256-SHA 1024 bits 343 338 344
06/11/13 Dual Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz 32GB ECDHE-RSA-AES256-SHA 1024 bits 6082 6143 6190
06/11/13 Intel(R) Atom(TM) CPU S1260 @ 2.00GHz 4GB ECDHE-RSA-AES256-SHA 1024 bits 388 387 389

For a more in depth look at SSL testing take a look at the excellent blog entry at Exceliance.

So which CPU performed best for SSL TPS?

Well the Dual Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz of course as it was the fastest chip in the test, SSL TPS is pretty much a pure CPU operation after all....

However we were very impressed with the new Intel® Atom 8 core processor C2750 that we managed to get our hands on, awful boot up time on the motherboard as its a pre-production trial unit... But very nice performance figures for such a small low power board....

Incidentally all of our Dell units and our ENTERPRISE MAX unit at Loadbalancer.org use the Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, which clocks a fairly respectable 3600 SSL TPS!

But getting back to the point of this article, that's the same processor as the top of the range Kemp R-320 load balancer: http://kemptechnologies.com/emea/server-load-balancing-appliances/loadmaster-r320/overview.

So why do they claim 8,000 SSL TPS on their web site?: http://kemptechnologies.com/emea/server-load-balancing-appliances/product-matrix.html.

Deliberately provocative question by the way... Kemp sell a great product, we are just making the point that SSL stats as with all stats should be taken with a pinch of salt!

Just found another really nice blog on open source SSL performance by Vincent Bernat.