Diffable; What if GitHub supported it natively?
Steve Souders told me about Diffable, when I saw him after his awesome Velocity conference.
Diffable is an open source project that allows you to only send down the deltas in your application versions, versus full new downloads (which may have a large amount of duplicate data).
In their presentation, Josh Harrison and James deBoer, talk about the details after the start with the core issues:
Problem:
Frequently modified web resources must be downloaded in their entirety with every modification.
Even a small change invalidates the cache.
Rich internet applications often have large amounts of static content.Idea:
Initial application resources kept in cache.
Changes to cached versions transmitted as deltas.
Deltas merged client-side to generate latest JS version.Benefits:
Faster page load times for users with cached resources.
Small changes to large resources incur only small costs.
Steve summarizes things well in his post:
Diffable uses differential compression to reduce the size of JavaScript downloads. It makes a lot of sense. Suppose your web site has a large external script. When a new release comes out, it’s often the case that a bulk of that large script is unchanged. And yet, users have to download the entire new script even if the old script is still cached.
Josh and James work on Google Maps which has a main script that is ~300K. A typical revision for this 300K script produces patches that are less than 20K. It’s wasteful to download that other 280K if the user has the old revision in their cache. That’s the inspiration for Diffable.
Diffable is implemented on the server and the client. The server component records revision deltas so it can return a patch to bring older versions up to date. The client component (written in JavaScript) detects if an older version is cached and if necessary requests the patch to the current version. The client component knows how to merge the patch with the cached version and evals the result.
With this technique in action, you end up sending down JS arrays as deltas that looks like:
[0,10,"red",40,3," leaps",25,15,16,3,"."]
The data that these guys share is impressive. The results seem to add up for applications as large as Google Maps. Do they measure up for smaller apps? If large apps have a lot of static content, couldn’t that content be put into another download and even app cached away?
Also, it is a lot of work to implement this for a developer. Work on both client side and server side. It would be great if we can experiment with Diffable and then move things lower into the stack. Why can’t HTTP itself be smart enough to deal with diffs?
It did make me think of my favourite chaps @github. I know that GitHub is about development rather than deployment…. but what if they supported this natively (since they kinda grok diffin’ etc already) and offered a client side loader so you could github.load("project", ...)
.
It all just makes me realise that GitHub is poised to pounce in many directions. Good on ‘em.
July 12th, 2010 at 8:34 am
We don’t encourage users to directly link to resources on GitHub. It takes a lot more computing power than serving a file from a document root to serve raw git blobs and diffs.
However, Git would work pretty well for this. Just use commit SHAs as versions, and you can easily make requests for SHA1…SHA2. It is a very interesting idea…
July 12th, 2010 at 9:48 am
There’s SDCH (Shared Dictionary Compression Over HTTP) which might be similar to what you are looking for?
July 12th, 2010 at 1:37 pm
Before diffable there was sdch which is lower in the stack. Before sdch there was rfc 3229. chrome uses sdch for google search.