I have been testing out different ways to optimize this site for performance, purely as a learning experience. I have come across several guides explaining how server-side caching works, some of them are really good, and some of them a bit out of date in terms of what I consider “best practice” in the industry these days. Most of the guides to server side-caching do not include the notion of SSL/TLS. Just like setting up a web server with SSL/TLS is more complex than setting up a web server without, setting up a cache with SSL/TLS is more complex than setting up a cache without. The goal of this article is to discuss some of the most popular methods and some of their advantages/disadvantages.
This article will be more about the overarching concepts and flow of information than actual configuration, but I’m hoping to do articles on how to actually configure the different options in future posts and incorporate them into this article.
What is a Server-Side Cache?
As opposed to a client-side cache, e.g. the browser caching images and other resources for a determined amount of time before it re-requests them, or compression of data before transit, which is potentially insecure over SSL/TLS anyway, server-side caching is about improving the serving of the information on your site solely within the bounds of your web server(s). When you request a web page from a web site, a lot is going on behind the scenes. Generally the reason that you have a server-side cache is to reduce the time to fulfill a response as well as reduce the load on back-end services by serving computationally inexpensive static content rather than computationally expensive dynamic content. Basically any static content that can be served off of disk(good) or directly out of RAM(best) the faster your response times and the better your back-end services are going to be used. Another benefit of a cache is that it can continue to serve content even when the back-end web server goes down.
For the purposes of this post we are assuming a very standard Linux/Apache||Nginx/MySQL/PHP setup.
Our baseline request is a simple request of a web page, written in PHP from a web server. Because we are using SSL/TLS an https request is sent to the server. The web server accepts the request and looks at the file the user requested. The server hands the information off to PHP and PHP compiles the script down to opcode which is the binary code that is actually executed. The executed code does all of the wonderful things that PHP does for us including potentially connecting to a back-end database for information. If this all sounds like a time consuming process, that is because it is. If we had a way to cache the static parts of our information we could skip a bunch of these costly back-end operations.
Alternate PHP Cache(APC)
Because we are using PHP in our example, we can use any of the various PHP caches. The only difference between APC and our standard request is that APC caches the opcodes, thereby skipping the reading in of the .php files and compiling them down to opcode.
In this example we don’t have to do any special configuration for SSL/TLS because everything is handled by PHP after the SSL connection is terminated at the webserver.
I don’t have a diagram for how memcached works because from a high-level it seems like it works similar to how APC works, and it does… kind of… The difference with memcached is that it is just a dumb (disposable) datastore that lives in RAM that is totally technology agnostic and relies on another application being smart enough to store data within memcached as the application sees fit. In the case of php this can be accomplished through the memcached php module. Though if you are using other technologies(Python, Java etc.) there is a way to use memcached with them as well. Another nice feature of memcached is that it can be distributed accross servers, so rather than each server having its own giant cache, each server can host a portion of it, saving on memory.
Again, in this example we don’t have to do any special configuration for SSL/TLS because everything is handled by PHP after the SSL connection is terminated at the webserver.
Sure seems like Nginx does almost everything. As it turns out it is pretty decent at caching too. The way that nginx caches is that it sits in front of whatever you website is, as a reverse proxy. As a reverse proxy Nginx can do some nice things for us, such as DNS load balancing and SSL termination. This allows us to optionally have multiple web servers behind the reverse proxy sharing the load. If you only have one web server, then you simply proxy the requests to another port that hosts the dynamic (normal) version of your site. Note that the back-end server that is serving the dynamic content can be the same nginx server, or another nginx server, or many Nginx servers or something else like Apache or even IIS if you really must. So the way that Nginx actually caches is that it watches the requests go by, and once a resource is requested a certain number of times within the configured duration of the cache, then it caches that resource to a pre-determined location on the hard drive of the server and then serves it out of there. But serving off of disk is slow, you say. I’m glad you mentioned it. If you want the cache to be in memory, simply edit /etc/fstab so that the folder that nginx is caching to is a tmpfs, and you are now serving out of RAM.
This is the first example of server side caching where we have to actually think about how SSL/TLS is implemented. In my case I simply had to split my Nginx config file into two parts. The first was the reverse proxy where my TLS connections terminated. The rest was my normal website with all of the rewrite rules and whatnot.
Varnish is a very popular and very powerful cache. Varnish even has its own configuration language that gives you a lot of power to mess with headers and specify exactly what you want to be cached. Just like Nginx, it sits in front of whatever you website is, as a reverse proxy. The major difference is that varnish does not do SSL/TLS termination. The author has a post as to why varnish doesn’t support SSL/TLS. The way I read it, the maintainer of varnish hasn’t seen an SSL/TLS implementation that he liked, so he would rather it be someone else’s problem. I’m not a fan of this position as people need to be able to securely communicate, and to ignore this is ridiculous. If he had said instead “varnish doesn’t do SSL because I adhere to the Unix philosophy of doing one thing and doing it well.” I would be fine with his decision.
Alright, so we can’t use Varnish with SSL/TLS right? Actually we can use the same reverse proxy trick for SSL termination that we used under Nginx. We can use either Nginx or a load balancer like Pound to act as our reverse proxy and SSL terminator. From that point, we make a plain http request to Varnish and it caches in a similar way to the way Nginx does. Why would you set this up over Nginx as a cache? I implemented varnish on my site and asked the same thing. Because of the added complexity and extra requests involved, you really need to require one of the advanced features of Varnish and have a fairly large environment. Overall it felt like a very fragile setup.
Many of the guides that I saw online hooked varnish up to port 80 to handle your plain-text requests, and then proxied the SSL/TLS requests to the same port 80 connection. The first problem with this is that it represents an outdated approach to SSL/TLS. We used to live in a world where we thought we could just re-direct you over a secure connection when you logged in. Then Firesheep showed us that if your authenticated state is stored in a cookie, transmitted in the clear then it doesn’t matter so much that you authenticated securely. I think we are at a point now where if you are going to go to the lengths to implement SSL/TLS on your site, it is worth enabling universally, if for nothing else than ensuring you didn’t miss anything. In the case of my site, Nginx is configured to listen on port 80 and a few rewrite rules redirect everything to https. In order to get Varnish to work with SSL/TLS in the way that we want it to, we basically have to do some fancy header redirection. In truth, while I technically got this solution working, I never got it working 100% to my satisfaction. It doesn’t really feel like a maintainable solution. I ended up going back to Nginx caching.
The final method of caching that I will mention is using a Content Delivery Network(CDN). This method is similar to Nginx caching and Varnish Caching in that the CDN acts as a caching reverse proxy. Because it is also focused on being geographically distributed and acting as a load balancer, CDNs provide additional performance and Distributed Denial of Service(DDoS) protection. However unlike Varnish and Nginx, a CDN is on a totally different server which you can not control, or audit. The CDN also reverse proxies over the internet, rather than internally within the server or within your datacenter(s). This means that if you are mindful of security you have to configure SSL/TLS first to the CDN, so that the initial https request is encrypted, and then from the CDN to your actual web server, so that the connection to the back-end web server is encrypted. Some CDNs, including an often recommended one, Cloudflare, offer SSL/TLS configurations where SSL/TLS terminates to the CDN and then there is a plain-text request made across the internet to your back-end. They also have configurations where SSL/TLS is implemented correctly, in some cases the extra SSL/TLS connection can eliminate the whole performance benefit of using a CDN and only leave you with its other benefits, such as DDoS mitigation.
In this scenario we need to move your certificate from your server(example.com) to the CDN and then either obtain a certificate for the backend connection(cdn.example.com), or use a self-signed certificate, or use a wildcard certificate for both. The most potentially troubling issue with this setup is that you are knowingly allowing a third-party to man-in-the-middle all of your sensitive information, decrypt it, optimize and cache it, and then re-encrypt it, with virtually no way of auditing what is really going on. The other thing is that as a user, you have no way to know whether or not, when you connect to a CDN, the information is screaming across the internet unencrypted on the other side of the CDN connection.
Edit: I actually received some feedback on this section from an employee who works at CloudFlare. There are a lot of choices out there for CDNs, but in this article I was mostly talking in regards to CloudFlare, as it is, in my opinion, one of the leaders in what they do. I incorrectly stated that there was an increase in cost from switching from their “Flexible SSL” to their “Full SSL” when configuring SSL on your site when used in conjunction with cloudflare. It turns out that if you are a paying customer of CloudFlare, you can use any of the different SSL configurations without paying any more. In talking to the employee, I also pointed out to him that the knowledge article on CloudFlare’s site that kept coming up in search results when doing research for this article, I felt, did not properly convey the security implications of the differnt SSL options. To the CloudFlare employee’s credit he had the article updated almost immediately in a way that emphasizes the security implications of the different configurations. Thanks for reaching out!
Hopefully you have learned a few things in this article. One of the nice things about the two different approaches, those behind the web server(APC, memcached etc) and those in front of the web server(Nginx, Varnish and CDN), is that you can combine APC caching with Nginx, for example, as long as you have enough RAM/disk space. In my case I settled for using Nginx caching as that felt like the best solution for my needs. What do you think?