Wednesday 7 September 2011

Caching a dynamic web app with nginx

The company I'm working for at the moment has a web based application which normally is installed as an intranet app, but in the next month it will be part of a competition which will see it get hit, as an online demo, many orders of magnitude harder than it was designed for. With little time to re-engineer the app for scalability to these levels, I'm looking at the use of caching to get internet scale performance.

Since the app will be performing in read only mode with a known dataset, we can treat it as a static website even though its a fully-featured ajax-loving web app. Each POST request will always return the same results and will benefit from being cached reducing the load on the app. However, my first choice of caching, Varnish, doesn't supporting caching POST requests - treating them always as dynamic content. Luckily nginx does support this uncommon use case, so lets run through a proof of concept.

First of all, spin up a base instance on your favourite cloud platform; in this case I'm using ubuntu lucid on Rackspace cloud. Because we're going to need a non-core module for nginx, we'll need to compile nginx rather than install from package:

wget 'http://sysoev.ru/nginx/nginx-0.8.28.tar.gz'
tar -xzvf nginx-0.8.28.tar.gz
cd nginx-0.8.28/
 
apt-get install git-core
git clone http://github.com/simpl/ngx_devel_kit.git
git clone http://github.com/calio/form-input-nginx-module.git
 
apt-get install build-essential libpcre3-dev libssl-dev
./configure --add-module=./ngx_devel_kit --add-module=./form-input-nginx-module
make -j2
make install

We now should have an install in /usr/local/nginx with the form-input-nginx module compiled in. What we'll do now is setup two nginx servers. One will be a reverse proxy which will run on port 80, this will pass requests off, unless cached, to a second server running on port 8000 which we will pass through to php running under fastcgi on 9000. FIrst edit the nginx.conf file to setup the reverse proxy:

proxy_cache_path  /tmp/cache  levels=1:2    keys_zone=STATIC:10m inactive=24h  max_size=1g;
server {
        listen   80 default;
        server_name  localhost;

        location / {
                proxy_pass             http://localhost:8000;
                proxy_set_header       X-Real-IP  $remote_addr;
                proxy_cache_methods    POST;
                proxy_cache            STATIC;
                proxy_cache_valid      200  1d;
                set_form_input         $foo;
                proxy_cache_key        "$request_uri$foo";
        }
}

The key point here is to enable POSTs as a cache method, and to use the form-input-nginx module to extract a post variable which we'll wind into the cache key. This allows us to uniquely cache pages based on URI and POST data, otherwise we'd end up with the same page returning no matter the POST data. In the proof of concept this is a single variable 'foo'.

Now setup the backend server which will pass requests to php:

server {
    listen       8000;
    server_name  localhost;

    location ~ \.php$ {
            fastcgi_pass   127.0.0.1:9000;
            fastcgi_index  index.php;
            fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
            include fastcgi_params;
    }
}

Finally get php up an running on port 9000:

apt-get install php5-cgi
php-cgi -b 127.0.0.1:9000 &

We will create two web pages for the proof of concept, the first (index.html) is a simple form:

<form action="test.php" method="POST">
<input name="foo" type="text" value="bar" />
<input type="submit" />
</form>

The second page (test.php) is only slightly more complicated:

<h1>Am I cached?</h1>
<H2><?php echo time() . " : " . $_POST['foo']; ?></H2>

Let's give it a go, spin up nginx:

/usr/local/nginx/sbin/nginx

When we submit the form we see the current timestamp and the value of the 'foo' variable. The page is then cached so on reload the timestamp remains the same. However, if we go back and change the value of 'foo' in the form and submit again we will see a fresh page fetched (remember its part of the cache key) so in this way the proxy builds up a cache of all possible uri+POST data.

A few questions remain for full rollout - how performant is the form-input-nginx module? It's having to read the actual request and parse the POST data so its certainly going to impact the proxy performance. Secondly, on the real app, we may have to add in a proxy_ignore_headers directive if the app is being well behaved and setting cache-control or expires headers. This will force nginx to ignore them and cache the data anyway.

Thursday 14 July 2011

The right stuff: Part 1, Version Control - git and gitosis

I'm currently doing some work at a local startup that's looking to grow it's software engineering side. This has given me a unique opportunity to set up devops starting from a blank sheet. So I thought I'd blog about my progress along the way and share my experiences putting a toolset into place.
Part 1: Version Control - git and gitosis

As soon as you move beyond one developer version control becomes a necessity. Even alone, the advantages of being able to roll back your code, manage releases, and automate deployment are considerable. Version control is the foundation of any software engineering enterprise and so before anything else, we'll start here.

Things have progressed since the days of CVS, and now we have funky new distributed version control systems to play with. These allow developers to have a personal repository on their machines and push out changes to remote repositories for merging and release. Apart from being able to work more easily on the move, and encourage more regular commits of your code, this supposedly makes merges much less painless as well. So I'm going to run you through installing git, which is one of the most recognisable names in this category, and the server component, gitosis.

Before we start a note about environment - I'm working on a mac, but these instructions should be easily translatable to any platform. I'm also assuming that you have a 'management server' which is going to act as a persistant, backed up location for your repository and for other key software in later posts. In my case this is running Ubuntu, for other flavours of linux you will have to change the package install steps. I'm going to refer to these as local and mgt below and I'm assuming you also have an account called dev on mgt which we'll use to set things up.

Installation

Start by installing git on both local and mgt and setup your details correctly on local so commits are attributed:
git config --global user.name "John Doe"
git config --global user.email "john.doe@myco.com"
Next on mgt install gitosis, which on ubuntu is conveniently packaged:
sudo apt-get install gitosis
If you want to access your gitosis repository remotely, then ensure that your firewall/router has a rule setup to forward port 9418 to mgt. Next, create a ssh key setup for the dev user on mgt and initialise the repository:
ssh-keygen -t rsa
sudo -H -u gitosis gitosis-init <~/.ssh/id_rsa.pub
At this point, you're pretty much up and running, give it a go by checking out the admin repository on mgt:
mkdir ~/repo
cd ~/repo
git clone -o myco gitosis@mgt.myco.com:gitosis-admin.git
(Replace myco with a tag for your company; this option isn't strictly necessary, but I've found it useful when pushing changes to more than one place to be explicit about pushes to the company repository)

Adding a user

Now to create an account for yourself to access to your new repository. Generate yourself a ssh key if you don't have one already and copy it from local to the gitosis-admin project you just cloned:
ssh-keygen -t rsa
scp ~/.ssh/id_rsa.id dev@mgt.myco.com:~/repo/gitosis-admin/keydir/john.doe@myco.com.pub
Then edit ~/repo/gitosis-admin/gitosis.conf on mgt to create a new group for your team with your first project and add yourself to the members line. While you're there add yourself to the gitosis-admin team. Something like:
[gitosis]

[group gitosis-admin]
writeable = gitosis-admin
members = dev@mgt.myco.com john.doe@myco.com

[group teamawesome]
writable = myproject
members = john.doe@myco.com
Now for your first commit!
cd ~/repo/gitosis-admin
git add keydir/john.doe@myco.com.pub
git commit -am 'Created myproject and added John'
git push myco master
The git add does what it says and marks a file to be added to the repository - git won't add any files you don't tell it to, so you can have versioned and unversioned files in the same directory. The next line commits your changes to the local repository (in ~/repo/gitosis-admin/.git) with -a to tell git to commit all files which have changed. The last command then pushes the local repository to the gitosis server you cloned it from (which is also mgt in this first case).

Now you have access to the gitosis-admin project, let's add another user but this time locally. Get Jane to give you her public key and on local:
mkdir ~/repo
cd ~/repo
git clone -o myco gitosis@mgt.myco.com:gitosis-admin.git
cd gitosis-admin
cp janes-public-key.pub keydir/jane.doe@myco.com.pub
git add keydir/jane.doe@myco.com.pub
Edit gitosis.conf and add Jane to myproject members, then:
git commit -am 'Added Jane to myproject'
git push myco master
Now you've told gitosis-admin about myproject and added some users, lets actually make the project on local and initialise the local repository:
mkdir ~/repo/myproject
cd ~/repo/myproject
git init
And push it out to gitosis:
git remote add myco gitosis@mgt.mycom.com:myproject.git
git add .
git push myco master
Add a new file:
echo 'A test file' > test.txt
git add test.txt
Commit it locally (repository in ~/repo/myproject/.git):
git commit -am 'Added test file'
Push changes to gitosis for everyone else to grab:
git push myco master
And if Jane wants to work on the project from her machine she can do:
cd ~/repo
git clone -o myco gitosis@mgt.myco.com:myproject.git
One other useful command is if you're coming back to a repository that's already checked out and you want to make sure you have all the latest changes from gitosis:
git pull
So that's the basics, but there's one final thing I like to have setup... I've always liked to have a mailing list running which shows commits into the central repository so all developers can have an eye to what's going on and how projects are evolving. In this case I've setup a google group on our corporate google apps account and added John and Jane as members. Now to integrate gitosis with it. Unfortunately this requires editing a file in the actually repository itself on mgt:
cd /srv/gitosis/repositories/myproject.git
cat <<EOF |sudo tee -a config
[hooks]
mailinglist = git-commits-list@myco.com
announcelist = git-commits-list@myco.com
envelopesender = dev@myco.com
EOF
echo 'A meaningful description of my project' |sudo tee description
You also need to setup a hook. For future projects this will be rolled out by default by us dropping it into the templates directory so that you only have to edit the config file and provide a description as above:
sudo cp /usr/share/doc/git-core/contrib/hooks/post-receive-email /usr/share/git-core/templates/hooks/post-receive
sudo chmod +x /usr/share/git-core/templates/hooks/post-receive
cd /srv/gitosis/repositories/myproject.git/hooks
sudo -u gitosis ln -s sudo chmod +x /usr/share/git-core/templates/hooks/post-receive
That's it, try making a change and committing in your project - you should get an nice emailing telling you about the changes.

I hope that you've found this a useful, version control is one of the most worthwhile things to push for at any firm even though it seems complicated at first. Stick with it, soon it will seem like second nature. In the next post we'll get our teeth into something more complicated still - automating your infrastructure setup with Chef.

PS. Much of the above I'm typing up after the fact (note to self - document as I go along). So if you find any inaccuracies, missing steps, or the like I apologise and please let me know in the comments so I can fix.

Image courtesy of grendelkhan

Monday 2 May 2011

Applying SCRUM in the real world

Cross posted from my business website blog 'Digital Susurrus'

For the last few years Agile has pretty much been the only software engineering methodology in town, which in my opinion is great news. Of all the different flavours, SCRUM seems to be most in vogue with a constant stream of contract ScrumMaster roles popping up around Cambridgeshire. The interesting point to notice is that these roles all have a mandatory requirement for ScrumMaster certification, though you can pretty much guarantee that most of them will not be anywhere close to following a full SCRUM process.

Now I'm not a great fan of IT certifications. People collect these far to easily in my opinion - most are open book with multi-guess exams which are taken after a week away, at considerable expense, listening to someone pontificate about the work they did a decade ago. Certifications such as PRINCE2 bear no correlation with whether you're a good project manager, they certainly don't make you a good project manager, and in fact help to make poor project managers by allowing people to hide their lack of skill behind process and busywork. Give me someone who is just anal retentive about detail and is borderline OCD any day to get a project in on time and to budget.

I had the joy of going through ScrumMaster accreditation a couple of weeks ago, and it did nothing to improve my view of such courses. If anything SCRUM is even less suited to this kind of training as there isn't the focus on process and artifacts. This results in a two day course which was more a patchwork of tips and tricks then suitable preparation for revolutionising the way you build software.

My key gripe with the training though, was that although SCRUM has seemingly come of age and is clearly becoming acceptable with the mainstream, it is still expounded with this dogmatic fervour which essentially makes it unimplementable in almost all business environments. Unlike PRINCE2 or ITIL, which state that they are just collections of best practice, which you can take or leave as appropriate to make something that works for you, SCRUM says this is how things must be done. If you don't do it the SCRUM way from day one, then its not SCRUM - we wash our hands of you and you're doomed.

This is not particularly helpful in the real world, and certainly makes it easy to trumpet the perfection of SCRUM when its only SCRUM if its implemented in a perfect environment. The reality is if I'm selling to government I need to work out how to wrap my Agile process in a PRINCE2 approach because its mandated, I also will probably be working to a hard deadline, and there'll probably be performance management clauses in there. These things are not insurmountable, and shouldn't mean that I have to abandon hope of improving my development process by making it as Agile as I can. Personally I think broad brush Agile practices such as an iterative approach, release often, keeping the customer close, and many others, improve the development process at most organisations if taken on board.

I am particularly alienated by those who espouse the perfection of developers if only they weren't dragged down by the rest of the business. At it's heart the message is: if you don't code you don't have anything useful to add. It's a very conflict driven polemic which preaches a misanthropic attitude to everyone who isn't a developer, the complete opposite of a healthy team based business. Evidence does show that freeing up developers to make choices and including their views makes better products, however this doesn't mean all developers are always responsible with freedom, or that all developers are suitably equipped to make the right decisions on all topics. I've seen too many products designed by techies with no appreciate of the business realities to be in doubt about that.

Now I'm sure there are useful Agile courses out there that teach techniques that can be applied in the real world, sending people back to their organisations with an improved toolbox and an ability to give it a go without the fear that they have to turn the whole business on its head from day one to achieve anything. However, people aren't attending those courses because they don't provide certification, and more importantly they're seeking certification because recruiters are being lazy and are requiring it as a measure of suitability when hiring, even though their organisation most likely isn't compliant with SCRUM anyway!

I am not arguing that ScrumMasters can't be dogmatic about the purity of their methodology, my issue is with the evils of certification and sloppy recruitment. If you're looking to recruit someone in an Agile role, please don't ask for certification, ask for experience. If you must, ask for training which gives people a wide range of software engineering approaches which are applicable in different business environments. If you're looking for external help to become Agile, think carefully before you make the important decision to be a SCRUM organisation by writing it into a job description! And be very careful of hiring the fanatical ScrumMaster who may turn your business inside out to meet the requirements of their methodology, rather than listening to what you have to say about your business and find an Agile approach that will help you improve.

Image courtesy of jensjeppe

Monday 11 April 2011

To be or not to be... a consultant

Those who follow me closely will know that last month Opportunity Links closed it's doors. Whilst it's been sad to see an end to the last five years of work, it's also given me a chance to reflect on which bits of the job I enjoyed and the opportunity to focus on those elements in whatever comes next.

So after reflecting in tranquil forests on the meaning of life, the universe and everything, I came to the clear conclusion that the thing I enjoy most is innovation. I've been at my happiest when bootstrapping some disruptive product into the market, and at my most despondent when dealing with day to day line management issues for a team of 50.

Maybe this shouldn't be much of a revelation, I mean who really enjoys line management? But nobody had told my career which was obliviously heading down an increasingly management route. So this leads me to three options going forward to haul myself back where I should be:

  1. Create my own startup, or join one in the early stages
  2. Join a mature company that hasn't lost the innovation culture (there are quite a few of these in Cambridge)
  3. Become a consultant
After some thought I'm seriously considering option 3 which seems to be the flavour of the month. It will allow me to work on a variety of different projects, applying my skills in software engineering, project management, product management and innovation, and give me the freedom to maybe explore the startup option on the side.

My reticence is two-fold: Firstly I'm getting older and more risk averse - can I pull it off risking a stable income which my kids depend on? Gone are the days when I could live on Ramen noodles while coding from a bedsit. But yes, I think there's work there and my reputation is strong enough to secure it. The second issue is the sticking point - along with most of the business community, I dislike consultants. The majority of them are a waste of space and an expensive one at that. In many cases people turn freelance not because they're at the top of their game, but because they can't get hired. I'm not sure I'm totally happy with the idea of taking on the consultant label.

So I need a description for myself. One that isn't so obscure that it needs an explanation before people understand what I do, but one that doesn't have the negative connotations that come along with having 'consultant' on your business cards. Comments are open for ideas...

Monday 4 April 2011

Creating the BeGrand social media toolbox

Has it really been three years since my last blog post?

Well let me get back into the groove gently. For my grand return I'd like to rescue a post that I did over on the BeGrand blog, which is sadly no more as the project has come to an end. Although some of this is now dated, with some of the services mentioned having disappeared, I do think its worth capturing for posterity and the core concepts are still valid. So here we go in its unedited form...

Creating the BeGrand social media toolbox

October 19th 2009

As with most modern web projects, BeGrand.net needs to be linked to the social media web, both to be part of the distributed dialogue around grandparents and to draw interested visitors. However, keeping our finger on the social media pulse, even one as well defined as grandparents, requires a few handy tools to ease the pain.

I’m going to look at three modes of interaction for our social media toolbox; dashboard, asynchronous stream and real-time. These three modes should cover all our roles in the team, reflecting differing levels of attention. For the dashboard view my tool preference is Netvibes which allows you to setup a public tabbed page which you can populate with widgets (Pageflakes is also good for this). To create our asynchronous stream I’ll be using Friendfeed, though again any feed aggregator service will do (Jaiku is also good for this). Finally for real-time I’ll be using Yammer, a corporate twitter clone which works nicely with XMPP services such as gTalk to allow multi-device notifications.

Asynchronous streams

Starting with Friendfeed, I’ve setup three rooms one to pull in discussion of BeGrand around the web – BeGrand buzz, a second to publish all our social media activity – BeGrand zeitgeist, and a final room to aggregate discussion on grandparents across the net – BeGrand clippings.

For the buzz stream I want to pull in mentions of ‘begrand’ on blogs, in comments and on twitter, so I’ve added search RSS feeds from Google blog search, Backtype, and Twitter respectively (you may need to play with your search terms if you’re getting a lot of noise in your results, for instance I added -adrien to google blog search to exclude results from someone named Adrien Begrand who appears quite regularly).

For the zeitgeist stream I added in feeds from all our social media activities; our blogs, Delicious, Flickr, YouTube, Google Reader shares, and Backtype to pull all comments made across the net by our staff (planned future services include Slideshare, Upcoming, GetSatisfaction support topics ). To increase coverage of all this good stuff we’re doing, this feed is also wired into our begrandnet twitter account using TwitterFeed for automatic posting.

Finally the clippings stream pulls together two feeds; a Google blog search and a Google news search for ‘grandparents’. I’m mostly interested here in long form content rather than microblogging services like twitter which would overload this stream very quickly with mostly uninteresting stuff for such a generic search term.

Whilst dipping in and out of Friendfeed may suit many roles in your team who might look at this stuff once or twice a day to take the temperature, there should be someone in your team who’s job it is to scrub it all. Whether you call it your social media, community, or marketing manager, this role needs to read everything that comes through ‘buzz’ and ‘clippings’ and take action. On a practical level Friendfeed isn’t ideal for this kind of activity, so I use a feed reader such as Google Reader to wire in the two streams and maybe a few other bits and bobs such as your competitors output, making it easy to share, respond and archive.

Realtime

So I now have my three streams, but I don’t want to wait until I next look at Friendfeed to know what’s happening, so we need some real-time notification. There are a number of options here the easiest one being to use Friendfeed’s inbuilt notification settings to send updates to it’s desktop notifier or to a configurable instant messaging account.

However, what’s missing here is some persistence and discussion – the difference between IM and a service like Twitter. Although I want real-time notification, I also want that notification to be shared with the team and to be a focus for further internal social interaction. For this kind of thing we use the excellent Yammer service which is a closed twitter clone for corporates. It allows the wiring in of any number of RSS feeds which can then be subscribed to by the team. It also supports bridging to IM so I can still get messages to my gTalk account and use an app such as BeejiveIM to get push notifications to my iPhone when I’m on the move.

Dashboard

The third perspective needed is the Dashboard. Not everyone is happy with consuming a continual stream; some want a more structured approach as evinced by the popularity of iGoogle and My Yahoo. So I’ve put together a social media dashboard in Netvibes with a number of tabs which are organised around what I pretentiously think of as the four axis of social media:

Awareness

The first tab attempts to detail how well we’re marketing ourselves. It is primarily the output of the ‘buzz’ stream and as it’s the entry page for visitors, I use it as somewhere to put a couple of core Netvibes widgets.

Attention

This tab shows where BeGrand’s attention is focussed, what we’re interested in and where we’re looking. It pulls in our Google Reader shares, our bookmarks and the ‘clippings’ stream.

Activity

This tab pulls together what we’re up to and how we’re impacting the social media space. It embeds widgets for Twitter, Youtube, and Flickr as well as dropping in our full ‘zeitgeist’ stream.

Authorship

The final tab details our writings online, pulling in feeds from our blogs and the comment stream for BeGrand staff from Backtype.

Summary

Hopefully between these three views on our social media activity there’s something for everyone. We’ll see how it pans out as BeGrand launches and the volume increases. If anyone has any refinements I’m always interested in hearing how to improve our toolset.