Wednesday 7 September 2011

Caching a dynamic web app with nginx

The company I'm working for at the moment has a web based application which normally is installed as an intranet app, but in the next month it will be part of a competition which will see it get hit, as an online demo, many orders of magnitude harder than it was designed for. With little time to re-engineer the app for scalability to these levels, I'm looking at the use of caching to get internet scale performance.

Since the app will be performing in read only mode with a known dataset, we can treat it as a static website even though its a fully-featured ajax-loving web app. Each POST request will always return the same results and will benefit from being cached reducing the load on the app. However, my first choice of caching, Varnish, doesn't supporting caching POST requests - treating them always as dynamic content. Luckily nginx does support this uncommon use case, so lets run through a proof of concept.

First of all, spin up a base instance on your favourite cloud platform; in this case I'm using ubuntu lucid on Rackspace cloud. Because we're going to need a non-core module for nginx, we'll need to compile nginx rather than install from package:

wget 'http://sysoev.ru/nginx/nginx-0.8.28.tar.gz'
tar -xzvf nginx-0.8.28.tar.gz
cd nginx-0.8.28/
 
apt-get install git-core
git clone http://github.com/simpl/ngx_devel_kit.git
git clone http://github.com/calio/form-input-nginx-module.git
 
apt-get install build-essential libpcre3-dev libssl-dev
./configure --add-module=./ngx_devel_kit --add-module=./form-input-nginx-module
make -j2
make install

We now should have an install in /usr/local/nginx with the form-input-nginx module compiled in. What we'll do now is setup two nginx servers. One will be a reverse proxy which will run on port 80, this will pass requests off, unless cached, to a second server running on port 8000 which we will pass through to php running under fastcgi on 9000. FIrst edit the nginx.conf file to setup the reverse proxy:

proxy_cache_path  /tmp/cache  levels=1:2    keys_zone=STATIC:10m inactive=24h  max_size=1g;
server {
        listen   80 default;
        server_name  localhost;

        location / {
                proxy_pass             http://localhost:8000;
                proxy_set_header       X-Real-IP  $remote_addr;
                proxy_cache_methods    POST;
                proxy_cache            STATIC;
                proxy_cache_valid      200  1d;
                set_form_input         $foo;
                proxy_cache_key        "$request_uri$foo";
        }
}

The key point here is to enable POSTs as a cache method, and to use the form-input-nginx module to extract a post variable which we'll wind into the cache key. This allows us to uniquely cache pages based on URI and POST data, otherwise we'd end up with the same page returning no matter the POST data. In the proof of concept this is a single variable 'foo'.

Now setup the backend server which will pass requests to php:

server {
    listen       8000;
    server_name  localhost;

    location ~ \.php$ {
            fastcgi_pass   127.0.0.1:9000;
            fastcgi_index  index.php;
            fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
            include fastcgi_params;
    }
}

Finally get php up an running on port 9000:

apt-get install php5-cgi
php-cgi -b 127.0.0.1:9000 &

We will create two web pages for the proof of concept, the first (index.html) is a simple form:

<form action="test.php" method="POST">
<input name="foo" type="text" value="bar" />
<input type="submit" />
</form>

The second page (test.php) is only slightly more complicated:

<h1>Am I cached?</h1>
<H2><?php echo time() . " : " . $_POST['foo']; ?></H2>

Let's give it a go, spin up nginx:

/usr/local/nginx/sbin/nginx

When we submit the form we see the current timestamp and the value of the 'foo' variable. The page is then cached so on reload the timestamp remains the same. However, if we go back and change the value of 'foo' in the form and submit again we will see a fresh page fetched (remember its part of the cache key) so in this way the proxy builds up a cache of all possible uri+POST data.

A few questions remain for full rollout - how performant is the form-input-nginx module? It's having to read the actual request and parse the POST data so its certainly going to impact the proxy performance. Secondly, on the real app, we may have to add in a proxy_ignore_headers directive if the app is being well behaved and setting cache-control or expires headers. This will force nginx to ignore them and cache the data anyway.