Sunday, May 27, 2012

Lessons learnt from serving Queen Victoria's Journals

Towards the end of last year I was asked about how easy it would be to launch a very public website. Most of what the company does is relatively highly specialised, low traffic, high value, for a very narrow and targeted audience.

We were essentially unfamiliar with sites that were wide open and potentially interesting to the whole world (or a large fraction thereof). And we knew that there would be widespread media coverage. So there were real concerns that whatever we built would buckle, turning into a PR disaster. (Everyone's heard of the census launch, I expect.)

Almost 6 months later, we launched Queen Victoria's Journals. And yes, it ended up both nationally and locally on the BBC, in the UK newspapers including The Guardian, The Independent and the Daily Mail, and overseas in Canada and India.

After a huge amount of work, the launch went without a hitch. Traffic levels were right where we expected, and the system handled the traffic exactly as predicted. What's also clear is that if we hadn't done all the preparation work, it would most likely have been a disaster.

We're using pretty standard components - Java, Apache, Tomcat, Solr - and as I've explained previously, we maintain our own software stack. This is all hosted on Solaris Zones, built our way - so we can trivially build a bunch more, clone and restore them.

There's no real tuning involved in the standard components. They'll cope just fine, provided you don't do anything spectacularly stupid with the applications or data that you're serving. I built an isolated test setup, cloned regularly from a development build, so that I could run capacity tests without my work being affected by or impacting on regular development.

The site doesn't have that many pages, so I started by simply testing each one - using wget or ab (apache bench). I needed a whole bunch of servers to send the requests from - easy, just build a bunch more zones. And this showed that we could serve hundreds of pages a second from each tomcat, apart from one page which was returning a page every few seconds. The problem page - it's the Illustrations page linked to from the main toolbar - was being created dynamically via multiple queries to the search back-end which were being rendered each time. The content never changes (until we update the product, at any rate) so this is really a static page. Replacing it with something static not only fixed that problem, but dramatically reduced the memory footprint of tomcat, as we were holding search references open in the user session and generating huge numbers of temporary objects each time it was rendered.

The server capacity and performance issues solved, we went back to looking at network utilization. That's harder to solve from an infrastructure point of view - while I can trivially deploy a whole bunch more zones in a minute or so, it takes months to get additional fibre put in the ground. And our initial estimates, which were based on the bandwidth characteristics of some our existing sites, indicated we could well get close to saturating our network.

The truth is, though, that most sites are pretty inefficient, and ours started out as no exception. We got massive wins from compressing html with mod_gzip, we started to minify our javascript, and were able to dramatically decrease the file size of most of the images. (Sane jpeg quality settings are good; not including a 3k colour profile with a 10 byte png icon also helps.) Not only did this decrease our bandwidth requirements by a factor of 5 or more, it also improves responsiveness of the site because users have to download far less.

Most of the testing for bandwidth was really simple - construct a sample test, run it, and count the bytes transferred by looking at the apache logs. Simply replaying the session allows you to see what effect a change has, and you can easily see which requests are most important to address.

We also took the precaution of having some of the site hosted elsewhere, thanks to our good friends at EveryCity. They're using a Solaris derivative so everything's incredibly simple and familiar, making setup a breeze.

We learnt a lot from this exercise, but one of the primary lessons is that building sites that work well isn't hard, it just requires you not to do things that are phenomenally stupid (taking several seconds to dynamically generate a static page) or obviously inefficient (jpeg thumbnails that are hundreds of kilobytes each), that javascript minifies very well, and html (especially the hideously inefficient html I was looking at) compresses down really well.

Test. Identify worst offender. Fix. Repeat. Every time you go round the loop improves your chances of success.



Wednesday, May 23, 2012

Simple Zone Architecture

I use Solaris zones extensively - the assumption is that everything a user or application sees is a zone, everything is run in zones by default.

(System-level infrastructure doesn't, but that's basically NFS and nameservers. Everything else, just build another zone.)

After a lot of experience building and deploying zones, I've settled on what is basically a standard build. For new builds, that is; legacy replacement is a whole different ballgame.

First, start with a sparse-root zone. Apart from being efficient, this makes the OS read-only. Which basically means that there's no mystery meat, the zone is guaranteed to tbe the same as the host, and all zones are identical. Users in the zone can't change the system at all; which means that the OS administrator (me) can reliably assume that the OS is disposable.

Second, define one place for applications to be. It doesn't really matter what that is. Not being likely to conflict with anything else out there is good. Something in /opt is probably a good idea. We used to have this vary, so that different types of application used different names. But now we insist on /opt/company_name and every system looks the same. (That's the theory - some applications get really fussy and insist on being installed in one specific place, but that's actually fairly rare.)

This one location is a separate zfs filesystem loopback mounted from the global zone. Note that it's just mounted, not delegated - all storage management is done in the global zone.

Then, install everything you need in that one place. And we manage our own stack so that we don't have unnecessary dependencies on what comes with the OS, making the OS installation even more disposable.

We actually have a standard layout we use: install the components at the top-level, such as /opt/company_name/apache, which is root-owned and read-only, and then use /opt/company_name/project_name/apache as the server root. Similar trick works for most applications; languages and interpreters go at the top-level and users can't write to them. This is yet another layer of separation, allowing me to upgrade or replace an application or interpreter safely (and roll it back safely as well).

This means that if we want to back up a system, all we need is /opt/company_name and /var/svc/manifest/site to pick up the SMF manifests (and the SSH keys in /etc/ssh if we want to capture the system identity). That's back up. Restore is just unpacking the archive thus created, including the ssh keys; cloning a system you just unpack a backup of the system you want to reproduce. I have a handful of base backups so I can create a server of a given type from a bare zone in a matter of seconds.

(Because the golden location is its own zfs filesystem, you can use zfs send and receive to do the copy. For normal applications it's probably not worth it; for databases it's pretty valuable. A limitation here is that you can't go to an older zfs version.)

It's so simple there's just a couple of scripts - one to build a zone from a template, another to install the application stack (or restore a backup) that you want, with no need for any fancy automation.

Sunday, May 20, 2012

Vendor Stack vs build your own

Operating System distributions are getting ever more bloated, including more and more packages. While this reduces the need for the end user to build their own software, does it actually eliminate the need for systems administrators to manage the software on their systems?

I would argue that in many cases having software you rely on as part of the operating system is actually a hindrance rather than a help.

For example, much of my work involves building web servers. These include Java, Apache, Tomcat, MySQL and the like. And, when we deploy systems, we explicitly use our own private copies of each component in the stack.

This is a deliberate choice. And there are several reasons behind it.

For one, it ensures that we have exactly the build time options and, (in the case of apache) the modules we need. Often we require slightly different choices than the defaults.

Keeping everything separate insulates us from vendor changes - we're completely unaffected by a vendor applying a harmful patch, or from "upgrading" to a newer version of the components.

A corollary to this is that we can patch and update the OS on our servers with much more freedom, as we don't have to worry about the effect on our application stack at all. It goes the other way - we can update the components in our stack without having to touch the OS.

It also means that we can move applications between systems with different patch levels, able to go to both newer and older systems easily - and indeed, between different operating systems and chip architectures.

As we use Solaris zones extensively, this also allows us to have different zones with the components at different revision levels.

With all this, we simply don't need a vendor to supply the various components of the stack. If the OS needs them for something else then fine, we just don't want to get involved. In some cases (databases are the prime example) we go to some effort to make sure they don't get installed at all, because some poor user using the wrong version is likely to get hurt.

All this makes me wonder why operating system vendors bother with maintaining central copies of software that are no use to us. Indeed, many application stacks on unix systems come with their own private copies of the components they need, for exactly the reasons I outlined above. (I've lost count of the number of times something from Sun installed it's own private copy of Java.)

(While the above considers one particular aspect of servers, it's equally true of desktops. Perhaps even more so, as many operating system releases are primarily defined by how incompatible their user interface is to previous releases.)