Wednesday, 7 September 2011

Caching a dynamic web app with nginx

The company I'm working for at the moment has a web based application which normally is installed as an intranet app, but in the next month it will be part of a competition which will see it get hit, as an online demo, many orders of magnitude harder than it was designed for. With little time to re-engineer the app for scalability to these levels, I'm looking at the use of caching to get internet scale performance.

Since the app will be performing in read only mode with a known dataset, we can treat it as a static website even though its a fully-featured ajax-loving web app. Each POST request will always return the same results and will benefit from being cached reducing the load on the app. However, my first choice of caching, Varnish, doesn't supporting caching POST requests - treating them always as dynamic content. Luckily nginx does support this uncommon use case, so lets run through a proof of concept.

First of all, spin up a base instance on your favourite cloud platform; in this case I'm using ubuntu lucid on Rackspace cloud. Because we're going to need a non-core module for nginx, we'll need to compile nginx rather than install from package:

wget ''
tar -xzvf nginx-0.8.28.tar.gz
cd nginx-0.8.28/
apt-get install git-core
git clone
git clone
apt-get install build-essential libpcre3-dev libssl-dev
./configure --add-module=./ngx_devel_kit --add-module=./form-input-nginx-module
make -j2
make install

We now should have an install in /usr/local/nginx with the form-input-nginx module compiled in. What we'll do now is setup two nginx servers. One will be a reverse proxy which will run on port 80, this will pass requests off, unless cached, to a second server running on port 8000 which we will pass through to php running under fastcgi on 9000. FIrst edit the nginx.conf file to setup the reverse proxy:

proxy_cache_path  /tmp/cache  levels=1:2    keys_zone=STATIC:10m inactive=24h  max_size=1g;
server {
        listen   80 default;
        server_name  localhost;

        location / {
                proxy_pass             http://localhost:8000;
                proxy_set_header       X-Real-IP  $remote_addr;
                proxy_cache_methods    POST;
                proxy_cache            STATIC;
                proxy_cache_valid      200  1d;
                set_form_input         $foo;
                proxy_cache_key        "$request_uri$foo";

The key point here is to enable POSTs as a cache method, and to use the form-input-nginx module to extract a post variable which we'll wind into the cache key. This allows us to uniquely cache pages based on URI and POST data, otherwise we'd end up with the same page returning no matter the POST data. In the proof of concept this is a single variable 'foo'.

Now setup the backend server which will pass requests to php:

server {
    listen       8000;
    server_name  localhost;

    location ~ \.php$ {
            fastcgi_index  index.php;
            fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
            include fastcgi_params;

Finally get php up an running on port 9000:

apt-get install php5-cgi
php-cgi -b &

We will create two web pages for the proof of concept, the first (index.html) is a simple form:

<form action="test.php" method="POST">
<input name="foo" type="text" value="bar" />
<input type="submit" />

The second page (test.php) is only slightly more complicated:

<h1>Am I cached?</h1>
<H2><?php echo time() . " : " . $_POST['foo']; ?></H2>

Let's give it a go, spin up nginx:


When we submit the form we see the current timestamp and the value of the 'foo' variable. The page is then cached so on reload the timestamp remains the same. However, if we go back and change the value of 'foo' in the form and submit again we will see a fresh page fetched (remember its part of the cache key) so in this way the proxy builds up a cache of all possible uri+POST data.

A few questions remain for full rollout - how performant is the form-input-nginx module? It's having to read the actual request and parse the POST data so its certainly going to impact the proxy performance. Secondly, on the real app, we may have to add in a proxy_ignore_headers directive if the app is being well behaved and setting cache-control or expires headers. This will force nginx to ignore them and cache the data anyway.

Thursday, 14 July 2011

The right stuff: Part 1, Version Control - git and gitosis

I'm currently doing some work at a local startup that's looking to grow it's software engineering side. This has given me a unique opportunity to set up devops starting from a blank sheet. So I thought I'd blog about my progress along the way and share my experiences putting a toolset into place.
Part 1: Version Control - git and gitosis

As soon as you move beyond one developer version control becomes a necessity. Even alone, the advantages of being able to roll back your code, manage releases, and automate deployment are considerable. Version control is the foundation of any software engineering enterprise and so before anything else, we'll start here.

Things have progressed since the days of CVS, and now we have funky new distributed version control systems to play with. These allow developers to have a personal repository on their machines and push out changes to remote repositories for merging and release. Apart from being able to work more easily on the move, and encourage more regular commits of your code, this supposedly makes merges much less painless as well. So I'm going to run you through installing git, which is one of the most recognisable names in this category, and the server component, gitosis.

Before we start a note about environment - I'm working on a mac, but these instructions should be easily translatable to any platform. I'm also assuming that you have a 'management server' which is going to act as a persistant, backed up location for your repository and for other key software in later posts. In my case this is running Ubuntu, for other flavours of linux you will have to change the package install steps. I'm going to refer to these as local and mgt below and I'm assuming you also have an account called dev on mgt which we'll use to set things up.


Start by installing git on both local and mgt and setup your details correctly on local so commits are attributed:
git config --global "John Doe"
git config --global ""
Next on mgt install gitosis, which on ubuntu is conveniently packaged:
sudo apt-get install gitosis
If you want to access your gitosis repository remotely, then ensure that your firewall/router has a rule setup to forward port 9418 to mgt. Next, create a ssh key setup for the dev user on mgt and initialise the repository:
ssh-keygen -t rsa
sudo -H -u gitosis gitosis-init <~/.ssh/
At this point, you're pretty much up and running, give it a go by checking out the admin repository on mgt:
mkdir ~/repo
cd ~/repo
git clone -o myco
(Replace myco with a tag for your company; this option isn't strictly necessary, but I've found it useful when pushing changes to more than one place to be explicit about pushes to the company repository)

Adding a user

Now to create an account for yourself to access to your new repository. Generate yourself a ssh key if you don't have one already and copy it from local to the gitosis-admin project you just cloned:
ssh-keygen -t rsa
scp ~/.ssh/
Then edit ~/repo/gitosis-admin/gitosis.conf on mgt to create a new group for your team with your first project and add yourself to the members line. While you're there add yourself to the gitosis-admin team. Something like:

[group gitosis-admin]
writeable = gitosis-admin
members =

[group teamawesome]
writable = myproject
members =
Now for your first commit!
cd ~/repo/gitosis-admin
git add keydir/
git commit -am 'Created myproject and added John'
git push myco master
The git add does what it says and marks a file to be added to the repository - git won't add any files you don't tell it to, so you can have versioned and unversioned files in the same directory. The next line commits your changes to the local repository (in ~/repo/gitosis-admin/.git) with -a to tell git to commit all files which have changed. The last command then pushes the local repository to the gitosis server you cloned it from (which is also mgt in this first case).

Now you have access to the gitosis-admin project, let's add another user but this time locally. Get Jane to give you her public key and on local:
mkdir ~/repo
cd ~/repo
git clone -o myco
cd gitosis-admin
cp keydir/
git add keydir/
Edit gitosis.conf and add Jane to myproject members, then:
git commit -am 'Added Jane to myproject'
git push myco master
Now you've told gitosis-admin about myproject and added some users, lets actually make the project on local and initialise the local repository:
mkdir ~/repo/myproject
cd ~/repo/myproject
git init
And push it out to gitosis:
git remote add myco
git add .
git push myco master
Add a new file:
echo 'A test file' > test.txt
git add test.txt
Commit it locally (repository in ~/repo/myproject/.git):
git commit -am 'Added test file'
Push changes to gitosis for everyone else to grab:
git push myco master
And if Jane wants to work on the project from her machine she can do:
cd ~/repo
git clone -o myco
One other useful command is if you're coming back to a repository that's already checked out and you want to make sure you have all the latest changes from gitosis:
git pull
So that's the basics, but there's one final thing I like to have setup... I've always liked to have a mailing list running which shows commits into the central repository so all developers can have an eye to what's going on and how projects are evolving. In this case I've setup a google group on our corporate google apps account and added John and Jane as members. Now to integrate gitosis with it. Unfortunately this requires editing a file in the actually repository itself on mgt:
cd /srv/gitosis/repositories/myproject.git
cat <<EOF |sudo tee -a config
mailinglist =
announcelist =
envelopesender =
echo 'A meaningful description of my project' |sudo tee description
You also need to setup a hook. For future projects this will be rolled out by default by us dropping it into the templates directory so that you only have to edit the config file and provide a description as above:
sudo cp /usr/share/doc/git-core/contrib/hooks/post-receive-email /usr/share/git-core/templates/hooks/post-receive
sudo chmod +x /usr/share/git-core/templates/hooks/post-receive
cd /srv/gitosis/repositories/myproject.git/hooks
sudo -u gitosis ln -s sudo chmod +x /usr/share/git-core/templates/hooks/post-receive
That's it, try making a change and committing in your project - you should get an nice emailing telling you about the changes.

I hope that you've found this a useful, version control is one of the most worthwhile things to push for at any firm even though it seems complicated at first. Stick with it, soon it will seem like second nature. In the next post we'll get our teeth into something more complicated still - automating your infrastructure setup with Chef.

PS. Much of the above I'm typing up after the fact (note to self - document as I go along). So if you find any inaccuracies, missing steps, or the like I apologise and please let me know in the comments so I can fix.

Image courtesy of grendelkhan

Monday, 2 May 2011

Applying SCRUM in the real world

Cross posted from my business website blog 'Digital Susurrus'

For the last few years Agile has pretty much been the only software engineering methodology in town, which in my opinion is great news. Of all the different flavours, SCRUM seems to be most in vogue with a constant stream of contract ScrumMaster roles popping up around Cambridgeshire. The interesting point to notice is that these roles all have a mandatory requirement for ScrumMaster certification, though you can pretty much guarantee that most of them will not be anywhere close to following a full SCRUM process.

Now I'm not a great fan of IT certifications. People collect these far to easily in my opinion - most are open book with multi-guess exams which are taken after a week away, at considerable expense, listening to someone pontificate about the work they did a decade ago. Certifications such as PRINCE2 bear no correlation with whether you're a good project manager, they certainly don't make you a good project manager, and in fact help to make poor project managers by allowing people to hide their lack of skill behind process and busywork. Give me someone who is just anal retentive about detail and is borderline OCD any day to get a project in on time and to budget.

I had the joy of going through ScrumMaster accreditation a couple of weeks ago, and it did nothing to improve my view of such courses. If anything SCRUM is even less suited to this kind of training as there isn't the focus on process and artifacts. This results in a two day course which was more a patchwork of tips and tricks then suitable preparation for revolutionising the way you build software.

My key gripe with the training though, was that although SCRUM has seemingly come of age and is clearly becoming acceptable with the mainstream, it is still expounded with this dogmatic fervour which essentially makes it unimplementable in almost all business environments. Unlike PRINCE2 or ITIL, which state that they are just collections of best practice, which you can take or leave as appropriate to make something that works for you, SCRUM says this is how things must be done. If you don't do it the SCRUM way from day one, then its not SCRUM - we wash our hands of you and you're doomed.

This is not particularly helpful in the real world, and certainly makes it easy to trumpet the perfection of SCRUM when its only SCRUM if its implemented in a perfect environment. The reality is if I'm selling to government I need to work out how to wrap my Agile process in a PRINCE2 approach because its mandated, I also will probably be working to a hard deadline, and there'll probably be performance management clauses in there. These things are not insurmountable, and shouldn't mean that I have to abandon hope of improving my development process by making it as Agile as I can. Personally I think broad brush Agile practices such as an iterative approach, release often, keeping the customer close, and many others, improve the development process at most organisations if taken on board.

I am particularly alienated by those who espouse the perfection of developers if only they weren't dragged down by the rest of the business. At it's heart the message is: if you don't code you don't have anything useful to add. It's a very conflict driven polemic which preaches a misanthropic attitude to everyone who isn't a developer, the complete opposite of a healthy team based business. Evidence does show that freeing up developers to make choices and including their views makes better products, however this doesn't mean all developers are always responsible with freedom, or that all developers are suitably equipped to make the right decisions on all topics. I've seen too many products designed by techies with no appreciate of the business realities to be in doubt about that.

Now I'm sure there are useful Agile courses out there that teach techniques that can be applied in the real world, sending people back to their organisations with an improved toolbox and an ability to give it a go without the fear that they have to turn the whole business on its head from day one to achieve anything. However, people aren't attending those courses because they don't provide certification, and more importantly they're seeking certification because recruiters are being lazy and are requiring it as a measure of suitability when hiring, even though their organisation most likely isn't compliant with SCRUM anyway!

I am not arguing that ScrumMasters can't be dogmatic about the purity of their methodology, my issue is with the evils of certification and sloppy recruitment. If you're looking to recruit someone in an Agile role, please don't ask for certification, ask for experience. If you must, ask for training which gives people a wide range of software engineering approaches which are applicable in different business environments. If you're looking for external help to become Agile, think carefully before you make the important decision to be a SCRUM organisation by writing it into a job description! And be very careful of hiring the fanatical ScrumMaster who may turn your business inside out to meet the requirements of their methodology, rather than listening to what you have to say about your business and find an Agile approach that will help you improve.

Image courtesy of jensjeppe

Monday, 11 April 2011

To be or not to be... a consultant

Those who follow me closely will know that last month Opportunity Links closed it's doors. Whilst it's been sad to see an end to the last five years of work, it's also given me a chance to reflect on which bits of the job I enjoyed and the opportunity to focus on those elements in whatever comes next.

So after reflecting in tranquil forests on the meaning of life, the universe and everything, I came to the clear conclusion that the thing I enjoy most is innovation. I've been at my happiest when bootstrapping some disruptive product into the market, and at my most despondent when dealing with day to day line management issues for a team of 50.

Maybe this shouldn't be much of a revelation, I mean who really enjoys line management? But nobody had told my career which was obliviously heading down an increasingly management route. So this leads me to three options going forward to haul myself back where I should be:

  1. Create my own startup, or join one in the early stages
  2. Join a mature company that hasn't lost the innovation culture (there are quite a few of these in Cambridge)
  3. Become a consultant
After some thought I'm seriously considering option 3 which seems to be the flavour of the month. It will allow me to work on a variety of different projects, applying my skills in software engineering, project management, product management and innovation, and give me the freedom to maybe explore the startup option on the side.

My reticence is two-fold: Firstly I'm getting older and more risk averse - can I pull it off risking a stable income which my kids depend on? Gone are the days when I could live on Ramen noodles while coding from a bedsit. But yes, I think there's work there and my reputation is strong enough to secure it. The second issue is the sticking point - along with most of the business community, I dislike consultants. The majority of them are a waste of space and an expensive one at that. In many cases people turn freelance not because they're at the top of their game, but because they can't get hired. I'm not sure I'm totally happy with the idea of taking on the consultant label.

So I need a description for myself. One that isn't so obscure that it needs an explanation before people understand what I do, but one that doesn't have the negative connotations that come along with having 'consultant' on your business cards. Comments are open for ideas...

Monday, 4 April 2011

Creating the BeGrand social media toolbox

Has it really been three years since my last blog post?

Well let me get back into the groove gently. For my grand return I'd like to rescue a post that I did over on the BeGrand blog, which is sadly no more as the project has come to an end. Although some of this is now dated, with some of the services mentioned having disappeared, I do think its worth capturing for posterity and the core concepts are still valid. So here we go in its unedited form...

Creating the BeGrand social media toolbox

October 19th 2009

As with most modern web projects, needs to be linked to the social media web, both to be part of the distributed dialogue around grandparents and to draw interested visitors. However, keeping our finger on the social media pulse, even one as well defined as grandparents, requires a few handy tools to ease the pain.

I’m going to look at three modes of interaction for our social media toolbox; dashboard, asynchronous stream and real-time. These three modes should cover all our roles in the team, reflecting differing levels of attention. For the dashboard view my tool preference is Netvibes which allows you to setup a public tabbed page which you can populate with widgets (Pageflakes is also good for this). To create our asynchronous stream I’ll be using Friendfeed, though again any feed aggregator service will do (Jaiku is also good for this). Finally for real-time I’ll be using Yammer, a corporate twitter clone which works nicely with XMPP services such as gTalk to allow multi-device notifications.

Asynchronous streams

Starting with Friendfeed, I’ve setup three rooms one to pull in discussion of BeGrand around the web – BeGrand buzz, a second to publish all our social media activity – BeGrand zeitgeist, and a final room to aggregate discussion on grandparents across the net – BeGrand clippings.

For the buzz stream I want to pull in mentions of ‘begrand’ on blogs, in comments and on twitter, so I’ve added search RSS feeds from Google blog search, Backtype, and Twitter respectively (you may need to play with your search terms if you’re getting a lot of noise in your results, for instance I added -adrien to google blog search to exclude results from someone named Adrien Begrand who appears quite regularly).

For the zeitgeist stream I added in feeds from all our social media activities; our blogs, Delicious, Flickr, YouTube, Google Reader shares, and Backtype to pull all comments made across the net by our staff (planned future services include Slideshare, Upcoming, GetSatisfaction support topics ). To increase coverage of all this good stuff we’re doing, this feed is also wired into our begrandnet twitter account using TwitterFeed for automatic posting.

Finally the clippings stream pulls together two feeds; a Google blog search and a Google news search for ‘grandparents’. I’m mostly interested here in long form content rather than microblogging services like twitter which would overload this stream very quickly with mostly uninteresting stuff for such a generic search term.

Whilst dipping in and out of Friendfeed may suit many roles in your team who might look at this stuff once or twice a day to take the temperature, there should be someone in your team who’s job it is to scrub it all. Whether you call it your social media, community, or marketing manager, this role needs to read everything that comes through ‘buzz’ and ‘clippings’ and take action. On a practical level Friendfeed isn’t ideal for this kind of activity, so I use a feed reader such as Google Reader to wire in the two streams and maybe a few other bits and bobs such as your competitors output, making it easy to share, respond and archive.


So I now have my three streams, but I don’t want to wait until I next look at Friendfeed to know what’s happening, so we need some real-time notification. There are a number of options here the easiest one being to use Friendfeed’s inbuilt notification settings to send updates to it’s desktop notifier or to a configurable instant messaging account.

However, what’s missing here is some persistence and discussion – the difference between IM and a service like Twitter. Although I want real-time notification, I also want that notification to be shared with the team and to be a focus for further internal social interaction. For this kind of thing we use the excellent Yammer service which is a closed twitter clone for corporates. It allows the wiring in of any number of RSS feeds which can then be subscribed to by the team. It also supports bridging to IM so I can still get messages to my gTalk account and use an app such as BeejiveIM to get push notifications to my iPhone when I’m on the move.


The third perspective needed is the Dashboard. Not everyone is happy with consuming a continual stream; some want a more structured approach as evinced by the popularity of iGoogle and My Yahoo. So I’ve put together a social media dashboard in Netvibes with a number of tabs which are organised around what I pretentiously think of as the four axis of social media:


The first tab attempts to detail how well we’re marketing ourselves. It is primarily the output of the ‘buzz’ stream and as it’s the entry page for visitors, I use it as somewhere to put a couple of core Netvibes widgets.


This tab shows where BeGrand’s attention is focussed, what we’re interested in and where we’re looking. It pulls in our Google Reader shares, our bookmarks and the ‘clippings’ stream.


This tab pulls together what we’re up to and how we’re impacting the social media space. It embeds widgets for Twitter, Youtube, and Flickr as well as dropping in our full ‘zeitgeist’ stream.


The final tab details our writings online, pulling in feeds from our blogs and the comment stream for BeGrand staff from Backtype.


Hopefully between these three views on our social media activity there’s something for everyone. We’ll see how it pans out as BeGrand launches and the volume increases. If anyone has any refinements I’m always interested in hearing how to improve our toolset.

Wednesday, 16 April 2008

Parenting in the 21st century

I spent most of yesterday in London at the DCSF's Parenting in the 21st century event - another well run event from the department and Digital Public. The speaker line-up was well worth the trip with a great success story from Sally Russell at Netmums who is doing a wonderful job of raising awareness of non governmental organisations' ability to bring parents together through social media, and achieve outcomes that still remain out of reach for many government run projects.

It was interesting to hear that although Netmums seem to have very good relationships with government, they still hit the same barriers of access to government data sets and information, which many in the private sector believe they could add value to if only they were public. Hopefully Sally's involvement with the Power of Information group will move this issue along.

Niel McLean from Becta was also on fine form, painting a convincing picture of the power of technology to revolutionise our schools system, and in particular the relationship between pupils, their parents and teachers. His ideas around the blurring of boundaries, between home and school, the roles of teachers and parents, and the worlds of work and home, through the use of inclusive and ubiquitous technology raises interesting questions around how schools should embrace technology at a process level rather than purely as a tool.

In the breaks I caught up with Louise Derbyshire from Contact a Family, who's doing some very interesting work, supported by the parent know-how innovation fund, into the ways that social media can support parents of disabled children. Working through social networks and even Second Life she seems to be having significant success demonstrating the power of self-organising networks to provide high quality peer-to-peer support in this area, and I look forward to following the results of this project over the next few months.

I also bumped into Mark Weber from Attic Media, who have been doing some great work for the DCSF in the last few years. Mark had an interesting point to make that the ability of young people to fully use the Internet is often over-egged, something that we've seen ourselves in our research with young people who are often fairly limited in their exposure online to a small number of key websites and services (youtube, myspace, msn etc). It certainly seems to me that there is either a transition that happens in the late teens where the Internet expands from being purely a social tool to being an information resource that can be mined, or that there is a straightforward generational gap between us web 1.0 players who see the Internet through a library metaphor, and current crop of web 2.0 digital natives who experience it as social media.

This is a question that needs further research since, as was pointed out at the conference, the young people of today are only ten years off becoming the parents of tomorrow and we need to start planning now for the support they'll need.

Friday, 4 January 2008

A Lovefilm lifestream hack with Dapper and Pipes

So I'm a big fan of Lovefilm the online DVD rentals service. But for a while now I've been reviewing and rating my moveis in Flixster because that is a more open system with outputs such as Facebook apps and Opensocial.

This is less than ideal as I really should double-enter my ratings to ensure Lovefilm recommendations are accurate, so I was pleased to see that Lovefilm have launched a user public profile which allows you to publish some of your data to the world, see mine here. However, as is so often the case the way the data isn't portable, so I used it as an opportunity to try out a new service I'd heard about called Dapper.

I'd come across Dapper when they spoke at FOWA this year and have had them on my list to try out ever since. Dapper is enssentially screen scraping software, which allows you to feed it some example pages which it then pulls apart to work out what is content that you might want to repurpose. It's a very nicely built web application with huge potential given the output formats (XML, RSS, Netvibes, flash widgets to name a few). So I booted it up and fed it my Lovefilm profile page.

My first hurdle was that I had difficulty getting it to recognise certain parts of the page as content chunks. It couldn't grab the review separately from the list of stars and director, so in the end I have a little more information in the RSS description than I would have liked, I'm guessing this is down to how well the HTML is written. My second issue was that with the RSS option enabled it only allowed me to link data to the fields; title, description and pubdate. I was keen to take the movie artwork in as an RSS enclosure, so I went back and switch to XML output and just grabbed everything into an XML file. After bit of neatening up my finished 'Dap' was ready and published.

After my Dap was finished I wanted to clean things up and sort out the enclosure. So I jumped over to Yahoo! Pipes and took in the XML output, renamed a few fields and performed some regex's so that everything was how I wanted it. The final RSS feed output can be seen here.

Finally I wanted to get that into my lifestream. Currently I'm using Jaiku for this purpose, so I jumped over and added in the new RSS feed which in short order appeared nicely into my stream with my first review. As I use this to wire updates into other systems (e.g. through the Jaiku app in Facebook), my new review quickly perculated into my social network.

Overall this probably took me two hours to setup, but a lot of that was learning and fiddling around the edges. Dapper is certainly a very powerful tool and combined with the more programmatic functionality of Yahoo! Pipes, which allows me to start making my locked up information more portable. The one restriction at the moment is the current inability to automate any logins, so if you don't have a shortcut private URL for your data you're out of luck on an automated login. I think this is possibly an opportunity for OpenId in the future.